Skip to main content

A Hierarchical Approach for Better Estimation of Unseen Event Likelihood in Speech Recognition

01 January 2003

New Image

The backoff hierarchical class n-gram language models (LMs) are a generalization of the common backoff word n-gram LMs. Compared to the traditional backoff word n-gram LMs that uses (n-1)-gram to estimate the likelihood of an unseen n-gram event, backoff hierarchical class n-gram LMs uses a class hierarchy to define an appropriate context. In this paper, we study the inmpact of the hierarchy depth on the performance of the approach. Performance is evaluated on several databases such as switchboard, call-home and Wall Street Journal (WSJ). Results show that better improvement is achieved when a shallow word (few levels) tree is used. Experiments show up to 26% improvement on the unseen events perplexity and up to 12% improvement in the word error rate (WER).