Improving Robustness of Speaker Verification Against Mimicked Speech
Kuruvachan K George, Santhosh Kumar C, Ramachandran K I, Ashish Panda |
---|
Making speaker verification (SV) systems robust to spoofed/mimicked speech attacks is very important to make its use effective in security applications. In this work, we show that using a proximal support vector machine backend classifier with i-vectors as inputs (i-PSVM) can help improve the performance of SV systems for mimicked speech as non-target trials. We compared our results with the state-of-the-art baseline i-vector with cosine distance scoring (i-CDS), i-vector with a backend SVM classifier (i-SVM) and cosine distance features with an SVM backend classifier (CDF-SVM) systems. In all experiments with SVM backend classifier, we over sampled the target utterance feature vectors before i-vector extraction using utterance partition followed by acoustic vector resampling (UP-AVR). UP-AVR helps solve the data imbalance problem, with a large number of non-target examples from the development data for training the models. In i-PSVM, proximity of the test utterance to the target and non-target class is the criteria for decision making while in i-SVM, the distance from the separating hyperplane is the criteria for the decision. It was seen that the i-PSVM approach is advantageous when tested with mimicked speech as non-target trials. This highlights that proximity to the target speakers is a better criteria for speaker verification for mimicked speech. Further, we note that weighting the target and non-target class examples helps us further fine tune the performance of i-PSVM. We then devised a strategy for estimating the weights for every example based on its cosine distance similarity with respect to the centroid of target class examples. The final i-PSVM with example based weighting scheme achieved an improvement of 3.39% absolute in EER when compared to the best baseline system, i-SVM. Subsequently, we fused the i-PSVM and i-SVM systems and results show that the performance of the combined system is better than the individual systems.