AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS
Audio/Visual Detection of Non-Linguistic Vocal Outbursts
Presented by: Stavros Petridis, Author(s): Florian Eyben, Technische Universitaet Muenchen, Germany; Stavros Petridis, Imperial College London, United Kingdom; Björn Schuller, Technische Universität München, Germany; George Tzimiropoulos, Stefanos Zafeiriou, Maja Pantic, Imperial College London, United Kingdom
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year’s Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
Lecture Information
Recorded: | 2011-05-25 15:05 - 15:25, Club D |
---|---|
Added: | 19. 6. 2011 17:31 |
Number of views: | 33 |
Video resolution: | 1024x576 px, 512x288 px |
Video length: | 0:24:42 |
Audio track: | MP3 [8.38 MB], 0:24:42 |
Comments