Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings <BR>(3 minutes introduction)

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings
(3 minutes introduction)

Shiliang Zhang (Alibaba, China), Siqi Zheng (Alibaba, China), Weilong Huang (Alibaba, China), Ming Lei (Alibaba, China), Hongbin Suo (Alibaba, China), Jinwei Feng (Alibaba, USA), Zhijie Yan (Alibaba, China)

In this paper, we propose an overlapping speech detection (OSD) system for real multiparty meetings. Different from previous works on single-channel recordings or simulated data, we conduct research on real multi-channel data recorded by an 8-microphone array. We investigate how spatial information provided by multi-channel beamforming can benefit OSD. Specifically, we propose a two-stream DFSMN to jointly model acoustic and spatial features. Instead of performing frame-level OSD, we try to perform segment-level OSD. We come up with an attention pooling layer to model speech segments with variable length. Experimental results show that two-stream DFSMN with attention pooling can effectively model acoustic-spatial feature and significantly boost the performance of OSD, result in 3.5% (from 85.57% to 89.12%) absolute detection accuracy improvement compared to the baseline system.

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
(3 minutes introduction)

Maokui He , Desh Raj , Zili Huang , Jun Du , Zhuo Chen , Shinji Watanabe

InterSpeech 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings
(3 minutes introduction)

Search in Audio

Related Recordings

LEAP Submission for the Third DIHARD Diarization Challenge
(longer introduction)

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
(3 minutes introduction)

InterSpeech 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings (3 minutes introduction)

Search in Audio

Related Recordings

LEAP Submission for the Third DIHARD Diarization Challenge (longer introduction)

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker (3 minutes introduction)

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings
(3 minutes introduction)

LEAP Submission for the Third DIHARD Diarization Challenge
(longer introduction)

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
(3 minutes introduction)