A Unified Neural Network-Based Speaker Localization Technique
01 July 2000
Locating and tracking a speaker in real-time using microphone arrays is important in many applications such as hands-free video conferencing, speech processing in large rooms, and acoustic echo cancellation. Speaker can be moving from the far-field to the near-field of the array, or vice versa. Many neural network based localization techniques exist, but they are applicable to either far-field or near-field sources, and are computationally intensive for real-time speaker localization applications because of the wideband nature of the speech. We propose a unified neural network based source localization technique, which is simultaneously applicable to wideband and narrowband signal sources that are in the far-field or near-field of a microphone array. The technique exploits Multilayer Perceptron (MLP) feedforward neural network structure, and form the feature vectors by computing the normalized instantaneous cross-power spectrum samples between adjacent pairs of sensors. Simulation results indicate that our technique is able to locate a source, with the absolute error of less than $3.5$ degrees at an SNR of $20$ dB and a sampling rate of 8000 Hz at each sensor.