Analysis of Teager Energy Profiles for Spoof Speech Detection
Madhu Kamble, Aditya Krishna Sai Pulikonda, Maddala Venkata Siva Krishna, Hemant Patil |
---|
The recent advances in the technologies pose a threat to the Automatic Speaker Verification (ASV) systems using different spoofing attacks, such as voice conversion (VC), speech synthesis (SS), and replay. To enhance the security of the ASV system, the need raised for the development of efficient anti-spoofing algorithms to detect spoof speech signals from natural signal. In this paper, we exploit Teager energy-based features for spoof speech detection (SSD) task. The Teager energy profiles computed for natural, VC, SS, and replay signals show the changes around the Glottal Closure Instants (GCIs). In particular, for SS signal, the bumps are very smooth compared to the natural signal. These variations around GCI of Teager energy profiles helps to discriminate the spoof signal from natural counterparts. The experiments are performed on ASVspoof 2015 and BTAS 2016 challenge databases. The Teager energy-based feature set, i.e., Teager Energy Cepstral Coefficients (TECC) performs well for S1-S9 spoofing algorithms obtaining average EER of 0.161 % (however, not for S10, where EER is 58.14 %) whereas state-of-the-art features, namely, Cochlear Filter Cepstral Coefficients-Instantaneous Frequency (CFCC-IF), and Constant-Q Cepstral Coefficients (CQCC) gave an EER of 0.39 % and 0.163 %, respectively. It is interesting to note that significant negative result by proposed feature set to S10 vs. natural speech confirms capability of TECC to represent characteristics of airflow pattern during natural speech production. Furthermore, the experiments performed on BTAS 2016 challenge dataset, gave 2.25 % EER on development set. On evaluation set, TECC feature set gave Half Total Error Rate (HTER) of 3.7% which is the metric provided by the challenge organizers and thus, overcoming the baseline by a noticeable difference of 3.16 %.