Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis <BR>(3 minutes introduction)

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

Nicholas Eng (University of Auckland, New Zealand), C.T. Justine Hui (University of Auckland, New Zealand), Yusuke Hioka (University of Auckland, New Zealand), Catherine I. Watson (University of Auckland, New Zealand)

This study investigates the use of speech enhancement techniques in creating text-to-speech voices with degraded or noisy speech. A number of synthetic voices were created using speech that was first degraded by different noise types at various signal-to-noise ratios (SNRs), then enhanced through four speech enhancement algorithms: Subspace, Wiener filter, SEGAN and a DNN-based method. Subjective listening tests show that the quality of the synthetic voices produced by subspace and the DNN-based method enhanced speech outperforms the quality of the voices created using Wiener filter or SEGAN enhanced speech at low SNRs, and speech enhanced by the subspace method results in higher quality synthetic speech at higher SNRs.

InterSpeech 2021

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

Search in Audio

Related Recordings

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
(3 minutes introduction)

Perception of social speaker characteristics in synthetic speech
(3 minutes introduction)

InterSpeech 2021

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis (3 minutes introduction)

Search in Audio

Related Recordings

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis (3 minutes introduction)

Perception of social speaker characteristics in synthetic speech (3 minutes introduction)

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
(3 minutes introduction)

Perception of social speaker characteristics in synthetic speech
(3 minutes introduction)