Hierarchical Phone Recognition with Compositional Phonetics <BR>(3 minutes introduction)

Hierarchical Phone Recognition with Compositional Phonetics
(3 minutes introduction)

Xinjian Li (Carnegie Mellon University, USA), Juncheng Li (Carnegie Mellon University, USA), Florian Metze (Carnegie Mellon University, USA), Alan W. Black (Carnegie Mellon University, USA)

There is growing interest in building phone recognition systems for low-resource languages as the majority of languages do not have any writing systems. Phone recognition systems proposed so far typically derive their phone inventory from the training languages, therefore the derived inventory could only cover a limited number of phones existing in the world. It fails to recognize unseen phones in low-resource or zero-resource languages. In this work, we tackle this problem with a hierarchical model, in which we explicitly model three different entities in a hierarchical manner: phoneme, phone, and phonological articulatory attributes. In particular, we decompose phones into articulatory attributes and compute the phone embedding from the attribute embedding. The model would first predict the distribution over the phones using their embeddings, next, the language-independent phones are aggregated to the language-dependent phonemes and then optimized by the CTC loss. This compositional approach enables us to recognize phones even they do not appear in the training set. We evaluate our model on 47 unseen languages and find the proposed model outperforms baselines by 13.1% PER.

Search in Audio

Related Recordings

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
(3 minutes introduction)

Hardik Sailor , Kiran Praveen T. , Vikas Agrawal , Abhinav Jain , Abhishek Pandey

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
(3 minutes introduction)

Shammur Absar Chowdhury , Amir Hussein , Ahmed Abdelali , Ahmed Ali

InterSpeech 2021

Hierarchical Phone Recognition with Compositional Phonetics (3 minutes introduction)

Search in Audio

Related Recordings

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages (3 minutes introduction)

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR (3 minutes introduction)

Hierarchical Phone Recognition with Compositional Phonetics
(3 minutes introduction)

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
(3 minutes introduction)

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
(3 minutes introduction)