SuperLectures.com

Search in Speech Titles Categories Author(s) Abstracts Slides

Your location: Home » ICASSP 2011 » Audio/Visual Detection of Non-Linguistic Vocal Outbursts

LOCALIZATION OF NON-LINGUISTIC EVENTS IN SPONTANEOUS SPEECH BY NON-NEGATIVE MATRIX FACTORIZATION AND LONG SHORT-TERM MEMORY

Full Paper at IEEE Xplore

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Presented by: Felix Weninger, Author(s): Felix Weninger, Björn Schuller, Martin Wöllmer, Gerhard Rigoll, Technische Universität München, Germany

Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i.e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5% in the discrimination of speech and different non-speech segments.

You need the Flash Player.

Share:

Download subtitles | Enlarge video

Search in Audio

Speech Transcript

Slides

Enlarge the slide | Show all slides in a pop-up window

0:00:16

1. slide

0:00:29

2. slide

0:00:46

3. slide

0:02:25

4. slide

0:03:21

5. slide

0:04:26

6. slide

0:07:45

7. slide

0:09:36

8. slide

0:10:37

9. slide

0:11:55

10. slide

0:12:20

11. slide

0:13:28

12. slide

0:14:31

13. slide

0:15:28

14. slide

0:16:50

15. slide

LOCALIZATION OF NON-LINGUISTIC EVENTS IN SPONTANEOUS SPEECH BY NON-NEGATIVE MATRIX FACTORIZATION AND LONG SHORT-TERM MEMORY [PDF], 0.19 MB

Comments

Please sign in to post your comment!

Links

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5947689

Lecture Information

Recorded:	2011-05-25 14:45 - 15:05, Club D
Added:	19. 6. 2011 17:19
Number of views:	24
Video resolution:	1024x576 px, 512x288 px
Video length:	0:19:47
Audio track:	MP3 [6.69 MB], 0:19:47

Related Lectures

0:11:42

ONLINE DETECTION OF VOCAL LISTENER RESPONSES WITH MAXIMUM LATENCY CONSTRAINTS

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Added: 20. 6. 2011 00:17

0:24:42

AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS

Audio/Visual Detection of Non-Linguistic Vocal Outbursts

Added: 19. 6. 2011 17:31