Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator <BR>(3 minutes introduction)

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
(3 minutes introduction)

Kazuki Mizuta (University of Tokyo, Japan), Tomoki Koriyama (University of Tokyo, Japan), Hiroshi Saruwatari (University of Tokyo, Japan)

This paper proposes Harmonic WaveGAN, a GAN-based waveform generation model that focuses on the harmonic structure of a speech waveform. Our proposed model uses two discriminators to capture characteristics of a speech waveform in a time domain and in a frequency domain, respectively. In one of them, a harmonic structure discriminator, a 2-D convolution layer called “harmonic convolution” is inserted to model a harmonic structure of a speech waveform. Although harmonic convolution has been shown to perform well in audio restoration tasks, this convolution layer has not yet been fully explored in the field of speech synthesis. Therefore, we seek to improve the perceptual quality of speech samples synthesized by the waveform generation model and investigate the usefulness of harmonic convolution in the field of speech synthesis. Mean opinion score tests showed that the Harmonic WaveGAN can synthesize more natural speech than conventional Parallel WaveGAN. We also showed that a spectrogram of a speech waveform showed a clearer harmonic structure when synthesized by our model than a speech waveform synthesized by the original Parallel WaveGAN.

Search in Audio

Related Recordings

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
(3 minutes introduction)

Jian Cong , Shan Yang , Lei Xie , Dan Su

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
(3 minutes introduction)

Ji-Hoon Kim , Sang-Hoon Lee , Ji-Hyun Lee , Seong-Whan Lee

InterSpeech 2021

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator (3 minutes introduction)

Search in Audio

Related Recordings

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis (3 minutes introduction)

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis (3 minutes introduction)

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
(3 minutes introduction)

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
(3 minutes introduction)

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
(3 minutes introduction)