An Introduction to the Application of the Theory of Probabilistic Functionsof a Markov Process to Automatic Speech Recognition
01 April 1983
It is generally agreed that information in the speech signal is encoded in the temporal variation of its short-duration power spectrum. To decode the signal, then, requires techniques for both estimation of power spectra and tracking their changes in time. This paper is concerned with the application of the theory of probabilistic functions of a (hidden) Markov chain to modeling the inherent nonstationarity of the speech signal for the purposes of automatic speech recognition (ASR). The use of hidden Markov models for ASR was proposed by Baker1,2 and, independently, by a group at IBM.3-12 The theory on which their work rests is due to Baum et al.13-17 Its first appearance in the literature occurred several years before Baker's studies and has since been 1035 explored in some detail.18,19 Our previous work in ASR has used temporal alignment procedures based on dynamic programming techniques,20 and we hoped that through studying the new (to us) body of material we could improve the performance and/or capabilities of our present ASR systems. Our initial goal, therefore, was to understand the theory of hidden Markov models sufficiently well to enable us to implement a new ASR system that could be compared directly to our existing ones. We have, in fact, been able to accomplish that goal, and a description and the results of our experiments are reported in a companion paper.21 In the course of our studies, we have collected and integrated a number of loosely related mathematical techniques pertinent to Markov modeling.