A Robust, Real-Time Endpoint Detector with Energy Normalization for ASR in Adverse Environments
01 January 2001
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, end-point detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations, conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, accurate, and robust endpointing algorithm for real-time ASR. We propose a novel approach of using a special filter plus a 3-state decision logic for endpoint detection. The filter was designed under several criteria to ensure the accuracy and robustness of detection. The detected endpoints are then applied to energy normalization simultaneously. Evaluation results showed that the proposed algorithm significantly reduced the string error rates on 7 out of 12 tested databases. The reduction rates even exceeded of 50% on two of them. The algorithm only uses one-dimensional energy with 24-frame lookahead; therefore, it has a low complexity and is suitable for real-time ASR.