SuperLectures.com

AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS

Full Paper at IEEE Xplore

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Přednášející: Stavros Petridis, Autoři: Florian Eyben, Technische Universitaet Muenchen, Germany; Stavros Petridis, Imperial College London, United Kingdom; Björn Schuller, Technische Universität München, Germany; George Tzimiropoulos, Stefanos Zafeiriou, Maja Pantic, Imperial College London, United Kingdom

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year’s Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.


  Přepis řeči

|

  Slajdy

Zvětšit slajd | Zobrazit všechny slajdy

0:00:16

  1. slajd

0:00:36

  2. slajd

0:02:43

  3. slajd

0:03:21

  4. slajd

0:05:40

  5. slajd

0:06:27

  6. slajd

0:06:44

  7. slajd

0:07:19

  8. slajd

0:07:59

  9. slajd

0:09:11

 10. slajd

0:10:15

 11. slajd

0:11:41

 12. slajd

0:12:17

 13. slajd

0:12:32

 14. slajd

0:13:14

 15. slajd

0:13:56

 16. slajd

0:16:08

 17. slajd

0:16:49

 18. slajd

0:17:06

    16. slajd

0:17:20

    18. slajd

0:18:21

 19. slajd

  Komentáře

Please sign in to post your comment!

  Informace o přednášce

Nahráno: 2011-05-25 15:05 - 15:25, Club D
Přidáno: 19. 6. 2011 17:31
Počet zhlédnutí: 33
Rozlišení videa: 1024x576 px, 512x288 px
Délka videa: 0:24:42
Audio stopa: MP3 [8.38 MB], 0:24:42