Speech Emotion Recognition with Multi-task Learning <BR>(3 minutes introduction)

Speech Emotion Recognition with Multi-task Learning
(3 minutes introduction)

Xingyu Cai (Baidu, USA), Jiahong Yuan (Baidu, USA), Renjie Zheng (Baidu, USA), Liang Huang (Baidu, USA), Kenneth Church (Baidu, USA)

Speech emotion recognition (SER) classifies speech into emotion categories such as: Happy, Angry, Sad and Neutral. Recently, deep learning has been applied to the SER task. This paper proposes a multi-task learning (MTL) framework to simultaneously perform speech-to-text recognition and emotion classification, with an end-to-end deep neural model based on wav2vec-2.0. Experiments on the IEMOCAP benchmark show that the proposed method achieves the state-of-the-art performance on the SER task. In addition, an ablation study establishes the effectiveness of the proposed MTL framework.

Search in Audio

Related Recordings

Metric Learning Based Feature Representation With Gated Fusion Model For Speech Emotion Recognition
(3 minutes introduction)

Yuan Gao , Jiaxing Liu , Longbiao Wang , Jianwu Dang

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables
(3 minutes introduction)

Nadee Seneviratne , Carol Espy-Wilson

InterSpeech 2021

Speech Emotion Recognition with Multi-task Learning (3 minutes introduction)

Search in Audio

Related Recordings

Metric Learning Based Feature Representation With Gated Fusion Model For Speech Emotion Recognition (3 minutes introduction)

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables (3 minutes introduction)

Speech Emotion Recognition with Multi-task Learning
(3 minutes introduction)

Metric Learning Based Feature Representation With Gated Fusion Model For Speech Emotion Recognition
(3 minutes introduction)

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables
(3 minutes introduction)