A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement <BR>(Oral presentation)

A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
(Oral presentation)

Xinlei Ren (Kuaishou Technology, China), Xu Zhang (Kuaishou Technology, China), Lianwu Chen (Kuaishou Technology, China), Xiguang Zheng (Kuaishou Technology, China), Chen Zhang (Kuaishou Technology, China), Liang Guo (Kuaishou Technology, China), Bing Yu (Kuaishou Technology, China)

People are meeting through video conferencing more often. While single channel speech enhancement techniques are useful for the individual participants, the speech quality will be significantly degraded in large meeting rooms where the far-field and reverberate conditions are introduced. Approaches based on microphone array signal processing are proposed to explore the inter-channel correlation among the individual microphone channels. In this work, a new causal U-net based multiple-in-multiple-out structure is proposed for real-time multi-channel speech enhancement. The proposed method incorporates the traditional beamforming structure with the multi-channel causal U-net by explicitly adding a beamforming operation at the end of the neural beamformer. The proposed method has entered the INTERSPEECH Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing. With 1.97M model parameters and 0.25 real-time factor on Intel Core i7 (2.6GHz) CPU, the proposed method has outperforms the baseline system of this challenge on PESQ, Si-SNR and STOI metrics.

Loading player

Search in Audio

Related Recordings

A Partitioned-Block Frequency-Domain Adaptive Kalman Filter for Stereophonic Acoustic Echo Cancellation
(Oral presentation)

Rui Zhu , Feiran Yang , Yuepeng Li , Shidong Shang

Real-Time Independent Vector Analysis Using Semi-Supervised Nonnegative Matrix Factorization as a Source Model
(Oral presentation)

Taihui Wang , Feiran Yang , Rui Zhu , Jun Yang

InterSpeech 2021

A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement (Oral presentation)

Search in Audio

Related Recordings

A Partitioned-Block Frequency-Domain Adaptive Kalman Filter for Stereophonic Acoustic Echo Cancellation (Oral presentation)

Real-Time Independent Vector Analysis Using Semi-Supervised Nonnegative Matrix Factorization as a Source Model (Oral presentation)

A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
(Oral presentation)

A Partitioned-Block Frequency-Domain Adaptive Kalman Filter for Stereophonic Acoustic Echo Cancellation
(Oral presentation)

Real-Time Independent Vector Analysis Using Semi-Supervised Nonnegative Matrix Factorization as a Source Model
(Oral presentation)