A Tutorial on Pronounciation Modeling for Large Vocabulary Speech Recognition
01 January 2003
Automatic speech recognition (ASR) research has progressed from the recognition of read speech to the recognition of spontaneous conversational speech in the past decade, prompting some in the field to re-evaluate ASR pronounciation models and their role of capturing the increased phonetic variability within unscripted speech. Two basic approaches for modeling pronounciation variation have emerged: encoding linguistic knowledge to pre-specify possible alternative pronounciations of words and deriving alternatives directly from a pronounciation corpus. This tutorial is intended to ground the reader in the basic linguistic concepts in phonetics and phonology that guide both of these techniques and to outline several pronounciation modeling strategies that have been employed through the years. The chapter will conclude with a summary of some promising recent research directions.