Automatic Error Correction for Speaker Embedding Learning with Noisy Labels <BR>(3 minutes introduction)

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels
(3 minutes introduction)

Fuchuan Tong (Xiamen University, China), Yan Liu (Xiamen University, China), Song Li (Xiamen University, China), Jie Wang (Xiamen University, China), Lin Li (Xiamen University, China), Qingyang Hong (Xiamen University, China)

Despite the superior performance deep neural networks have achieved in speaker verification tasks, much of their success benefits from the availability of large-scale and carefully labeled datasets. However, noisy labels often occur during data collection. In this paper, we propose an automatic error correction method for deep speaker embedding learning with noisy labels. Specifically, a label noise correction loss is proposed that leverages a model’s generalization capability to correct noisy labels during training. In addition, we improve the vanilla AM-Softmax to estimate a more robust speaker posterior by introducing sub-centers. When applied on the VoxCeleb dataset, the proposed method performs gracefully when noisy labels are introduced. Moreover, when combining with the Bayesian estimation of PLDA with noisy training labels at the back-end, the whole system performs better under conditions in which noisy labels are present.

Loading player

InterSpeech 2021

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels
(3 minutes introduction)

Search in Audio

Related Recordings

Presentation matters: Evaluating speaker identification tasks
(longer introduction)

An Integrated Framework for Two-pass Personalized Voice Trigger
(3 minutes introduction)

InterSpeech 2021

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels (3 minutes introduction)

Search in Audio

Related Recordings

Presentation matters: Evaluating speaker identification tasks (longer introduction)

An Integrated Framework for Two-pass Personalized Voice Trigger (3 minutes introduction)

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels
(3 minutes introduction)

Presentation matters: Evaluating speaker identification tasks
(longer introduction)

An Integrated Framework for Two-pass Personalized Voice Trigger
(3 minutes introduction)