ONLINE DETECTION OF VOCAL LISTENER RESPONSES WITH MAXIMUM LATENCY CONSTRAINTS
Audio/Visual Detection of Non-Linguistic Vocal Outbursts
Presented by: Daniel Neiberg, Author(s): Daniel Neiberg, KTH - Royal Institute of Technology, Sweden; Khiet P. Truong, University of Twente, Netherlands
When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as `yeah' and `mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has finished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classifier which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.
Lecture Information
Recorded: | 2011-05-25 14:25 - 14:45, Club D |
---|---|
Added: | 20. 6. 2011 00:17 |
Number of views: | 22 |
Video resolution: | 1024x576 px, 512x288 px |
Video length: | 0:11:42 |
Audio track: | MP3 [3.91 MB], 0:11:42 |
Comments