AN INVESTIGATION OF SUBSPACE MODELING FOR PHONETIC AND SPEAKER VARIABILITY IN AUTOMATIC SPEECH RECOGNITION

Acoustic Modeling

Presented by: Richard Rose, Author(s): Richard Rose, Shou-Chun Yin, Yun Tang, McGill University, Canada

This paper investigates the impact of subspace based techniques for modeling speaker variability and phonetic variability in automatic speech recognition(ASR). There are many well known approaches to speaker space based adaptation which represent sources of variability as a projection within a low dimensional subspace. A new approach to acoustic modeling in ASR, referred to as the subspace based Gaussian mixture model (SGMM), represents phonetic variability as a set of projections applied at the state level in a hidden Markov model (HMM) based acoustic model. The impact of the SGMM in modeling these intrinsic sources of variability is evaluated for a continuous speech recognition (CSR) task where the performance of continuous density HMM(CDHMM) based ASR systems is already reasonably good. Speaker independent SGMM based ASR was shown to provide an 18% reduction in word error rate (WER) over the CDHMM and a 5% reduction in WER over unsupervised speaker adaptation in the resource management CSR domain.

You need the Flash Player.

Share:

Download subtitles | Enlarge video

1. slide