Skip to main content

An Auditory System-Based feature for Robust Speech Recognition

01 January 2001

New Image

A new auditory-based feature extraction algorithm for robust speech recognition in adverse acoustic environments is presented. The algorithm is developed based on the analysis of human peripheral auditory system. We first divide the auditory system into several modules, then model each module from a signal processing point of view with a constraint on computational complexity. The feature computation is comprised of outer-middle-ear transfer function, FFT, frequency conversion from linear to the Bark scales, auditory filtering, nonlinearity, and discrete cosine transform. 

The feature is evaluated in two recognition tasks: connected-digit recognition and large vocabulary continuous speech recognition. The testing databases include various operating environments, such as handset and handsfree in landline and wireless communications with additive car and babble noise. Compared with the LPCC, MFCC, and PLP features, the proposed feature has an average 20% to 30% string error rate reduction on the connected-digit task, and 8& to 14% word error rate reduction on the Wall Street Journal task in various additive noise conditions. 

The computational complexity of the new feature is comparable to other commonly used feature-extraction algorithms. It takes only about 1% of real time to compute the new feature using today's computer. This makes it an ideal choice for practical recognition applications.