Skip to main content

An Automatic Segmentation Procedure for Segment-Based Speech Recognition

05 September 1986

New Image

Although the word-based (WB) approach to speech recognition is popular for its simplicity in implementation and good performance for small-to-medium size vocabulary, isolated word recognition tasks, the approach cannot be easily extended to a large vocabulary or continuous speech. For large vocabulary speech recognition a large amount of training data, proportional to the vocabulary size N, is needed to characterize each individual word model. In continuous speech recognition the amount of training data needed for characterizing the word junctures is even more demanding, i.e., on the order of N2. In order to overcome these difficulties, a subword unit, segment-based (SB) approach is a viable alternative to the WB approach. However, preparing a subword segment inventory of a reasonable size, say 200-1000 entries, is not a trivial task. Manual segmentation can be used but it has two major drawbacks: i) The process is both laborious and tedious requiring, for example, extensive spectrogram reading and listening. ii) Due to the lack of an objective criterion, manual procedures unavoidably will exhibit some inconsistencies. In order to avoid the these problems, we propose a automatic procedure for segmenting speech into sub-word units to be used in a segment-based speech recognition task.