TEACHER-STUDENT MIXIT FOR UNSUPERVISED AND SEMI-SUPERVISED SPEECH SEPARATION <BR>(3 minutes introduction)

TEACHER-STUDENT MIXIT FOR UNSUPERVISED AND SEMI-SUPERVISED SPEECH SEPARATION
(3 minutes introduction)

Jisi Zhang (University of Sheffield, UK), Cătălin Zorilă (Toshiba, UK), Rama Doddipatla (Toshiba, UK), Jon Barker (University of Sheffield, UK)

In this paper, we introduce a novel semi-supervised learning framework for end-to-end speech separation. The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model. The teacher model then estimates separated sources that are used to train a student model with standard permutation invariant training (PIT). The student model can be fine-tuned with supervised data, i.e., paired artificial mixtures and clean speech sources, and further improved via model distillation. Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem observed in the original MixIT method. Further, the semi-supervised performance is comparable to a fully-supervised separation system trained using ten times the amount of supervised data.

Search in Audio

Related Recordings

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
(3 minutes introduction)

Thilo von Neumann , Keisuke Kinoshita , Christoph Boeddeker , Marc Delcroix , Reinhold Haeb-Umbach

Few shot-learning of new sound classes for target sound extraction
(3 minutes introduction)

Marc Delcroix , Jorge Bennasar Vázquez , Tsubasa Ochiai , Keisuke Kinoshita , Shoko Araki

InterSpeech 2021

TEACHER-STUDENT MIXIT FOR UNSUPERVISED AND SEMI-SUPERVISED SPEECH SEPARATION (3 minutes introduction)

Search in Audio

Related Recordings

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers (3 minutes introduction)

Few shot-learning of new sound classes for target sound extraction (3 minutes introduction)

TEACHER-STUDENT MIXIT FOR UNSUPERVISED AND SEMI-SUPERVISED SPEECH SEPARATION
(3 minutes introduction)

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
(3 minutes introduction)

Few shot-learning of new sound classes for target sound extraction
(3 minutes introduction)