Adversarial Voice Conversion against Neural Spoofing Detectors <BR>(3 minutes introduction)

Adversarial Voice Conversion against Neural Spoofing Detectors
(3 minutes introduction)

Yi-Yang Ding (USTC, China), Li-Juan Liu (iFLYTEK, China), Yu Hu (USTC, China), Zhen-Hua Ling (USTC, China)

The naturalness and similarity of voice conversion have been significantly improved in recent years with the development of deep-learning-based conversion models and neural vocoders. Accordingly, the task of detecting spoofing speech also attracts research attention. In the latest ASVspoof 2019 challenge, the best spoofing detection model can distinguish most artificial utterances from natural ones. Inspired by recent progress of adversarial example generation, this paper proposes an adversarial post-processing network (APN) which generates adversarial examples against a neural-network-based spoofing detector by white-box attack. The APN model post-processes the speech waveforms generated by a baseline voice conversion system. An adversarial loss derived from the spoofing detector together with two regularization losses are applied to optimize the parameters of APN. In our experiments, using the logical access (LA) dataset of ASVspoof 2019, results show that our proposed method can improve the adversarial ability of converted speech against the spoofing detectors based on light convolution neural networks (LCNNs) effectively without degrading its subjective quality.

Search in Audio

Related Recordings

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
(3 minutes introduction)

Kun Zhou , Berrak Sisman , Haizhou Li

Adversarially Learning Disentangled Speech Representations for Robust Multi-factor Voice Conversion
(3 minutes introduction)

Jie Wang , Jingbei Li , Xintao Zhao , Zhiyong Wu , Shiyin Kang , Helen Meng

InterSpeech 2021

Adversarial Voice Conversion against Neural Spoofing Detectors (3 minutes introduction)

Search in Audio

Related Recordings

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training (3 minutes introduction)

Adversarially Learning Disentangled Speech Representations for Robust Multi-factor Voice Conversion (3 minutes introduction)

Adversarial Voice Conversion against Neural Spoofing Detectors
(3 minutes introduction)

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
(3 minutes introduction)

Adversarially Learning Disentangled Speech Representations for Robust Multi-factor Voice Conversion
(3 minutes introduction)