Speech Synthesis as A Statistical Machine Learning Problem
Keiichi Tokuda (Nagoya Institute of Technology) | Keiichi Tokuda |
---|
Speech synthesis is often regarded as a messy problem. This talk will discuss how we can formulate the problem of speech synthesis in a statistical machine learning framework. The basic problem of speech synthesis can be stated as follows:
We have a speech database, i.e., a set of speech waveforms and corresponding texts. Given a text to be synthesized, what is the speech waveform corresponding to the text?
The whole text-to-speech generation process can be decomposed into feasible subproblems, which can also be combined as a statistical model for training. One of the subproblems is statistical parametric speech synthesis, which is called "HMM-based speech synthesis" when we use hidden Markov models (HMMs) as statistical models. The talk will also discuss future challenges and the direction in speech synthesis research.
Outline
0:00:30
0:03:07
0:07:44
0:08:13
0:16:57
0:20:13
0:22:59
0:29:31
0:43:34
0:45:21