SuperLectures.com

AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS

Full Paper at IEEE Xplore

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Presented by: Stavros Petridis, Author(s): Florian Eyben, Technische Universitaet Muenchen, Germany; Stavros Petridis, Imperial College London, United Kingdom; Björn Schuller, Technische Universität München, Germany; George Tzimiropoulos, Stefanos Zafeiriou, Maja Pantic, Imperial College London, United Kingdom

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year’s Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.


  Speech Transcript

|

  Slides

Enlarge the slide | Show all slides in a pop-up window

0:00:16

  1. slide

0:00:36

  2. slide

0:02:43

  3. slide

0:03:21

  4. slide

0:05:40

  5. slide

0:06:27

  6. slide

0:06:44

  7. slide

0:07:19

  8. slide

0:07:59

  9. slide

0:09:11

 10. slide

0:10:15

 11. slide

0:11:41

 12. slide

0:12:17

 13. slide

0:12:32

 14. slide

0:13:14

 15. slide

0:13:56

 16. slide

0:16:08

 17. slide

0:16:49

 18. slide

0:17:06

    16. slide

0:17:20

    18. slide

0:18:21

 19. slide

  Comments

Please sign in to post your comment!

  Lecture Information

Recorded: 2011-05-25 15:05 - 15:25, Club D
Added: 19. 6. 2011 17:31
Number of views: 33
Video resolution: 1024x576 px, 512x288 px
Video length: 0:24:42
Audio track: MP3 [8.38 MB], 0:24:42