On the use of phone-gram units in recurrent neural networks for language identification

Christian Salamea, Luis Fernando D'Haro, Ricardo Cordoba, Rubén San-Segundo

In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.

Switch Camera

Odyssey 2016

The Speaker and Language Recognition Workshop

On the use of phone-gram units in recurrent neural networks for language identification

Search in Audio

Speech Transcript

Related Recordings

Deep Language: a comprehensive deep learning approach to end-to-end language recognition

Language Recognition for Dialects and Closely Related Languages