Title: "Adaptive Boosting for Automatic Speech Recognition"
The atomic units of most automatic speech recognition (ASR) systems are the phonemes. However, the most widely used features in ASR are perceptual linear prediction (PLP) and mel-frequency cepstral coefficients (MFCC), which do not carry the phoneme information explicitly. The discriminative features with phoneme information have been shown more powerful for ASR accuracy. The process of generating the discriminative features relies on training classifiers to transform the original features to a new probabilistic features. One of most commonly used techniques for measuring the probabilities in continuous distributions is Gaussian mixture models (GMM). In this work, the GMM-based classifier is used to convert each acoustic feature vector to a posterior probability vector given all classes. Furthermore, an adaptive boosting (AdaBoost) algorithm is applied to combine the classifiers to enhance the performance. The training of GMM-based AdaBoost classifiers requires very expensive computation. To make it feasible for very large vocabulary speech recognition systems with thousands of hours of training data, we have implemented a hierarchical AdaBoost to split the whole training to multiple parallel processes. The speed up reduced the training data time from about more 100 days to within a week. The AdaBoost features were then used successfully to combine with spectral feature for ASR. Compared to the baseline of the standard features, the AdaBoost system reduced the word-error-rate (WER) by 2%. Moreover, the AdaBoost system also contributed consistent gains on the system combination even compared with a very strong baseline.
Advisor: John Makhoul (Raytheon BBN technologies)
Professor Gilead Tadmor
Professor Jennifer Dy
Spyros Matsoukas (Amazon.com)