SuperLectures.com

LOCALIZATION OF NON-LINGUISTIC EVENTS IN SPONTANEOUS SPEECH BY NON-NEGATIVE MATRIX FACTORIZATION AND LONG SHORT-TERM MEMORY

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Full Paper at IEEE Xplore

Přednášející: Felix Weninger, Autoři: Felix Weninger, Björn Schuller, Martin Wöllmer, Gerhard Rigoll, Technische Universität München, Germany

Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i.e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5% in the discrimination of speech and different non-speech segments.


  Přepis řeči

|

  Slajdy

Zvětšit slajd | Zobrazit všechny slajdy

0:00:16

  1. slajd

0:00:29

  2. slajd

0:00:46

  3. slajd

0:02:25

  4. slajd

0:03:21

  5. slajd

0:04:26

  6. slajd

0:07:45

  7. slajd

0:09:36

  8. slajd

0:10:37

  9. slajd

0:11:55

 10. slajd

0:12:20

 11. slajd

0:13:28

 12. slajd

0:14:31

 13. slajd

0:15:28

 14. slajd

0:16:50

 15. slajd

  Komentáře

Please sign in to post your comment!

  Informace o přednášce

Nahráno: 2011-05-25 14:45 - 15:05, Club D
Přidáno: 19. 6. 2011 17:19
Počet zhlédnutí: 24
Rozlišení videa: 1024x576 px, 512x288 px
Délka videa: 0:19:47
Audio stopa: MP3 [6.69 MB], 0:19:47