Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge
(3 minutes introduction)
Wei Chu (PAII, USA), Peng Chang (PAII, USA), Jing Xiao (PAII, USA) |
---|
This paper proposed to automatically detect mispronounced words over the regions that have low Goodness-of-Pronunciation scores through a constrained phone decoder, then add these word mispronunciations into the orthodox lexicon without colliding with existing pronunciations, finally use the expanded lexicon for decoding non-native speech. The constrained phone decoder is compiled by using a phone-level automatically generated one-edit-distance network to eliminate the need of extended recognition networks designed by phonologists. Results and analysis have shown that the pronunciation dictionary extension is effective in improving WER performance for non-native speech recognition. This paper also described the details of PAII’s single-pass fusion-free hybrid system for this Interspeech 2021 non-native children English close track ASR challenge, especially showed the effective use of non-speech segments in the training set as noise sources to perform noise augmentation on the training data, and also conducted a comparison of acoustic models with different neural network architectures with analysis. Final WERs of 12.10%/28.25% are obtained compared to a well-optimized baseline with WERs of 13.37%/33.51% on development/evaluation set, respectively.