Boosting HMM acoustic models in large vocabulary speech recognition

作者: Carsten Meyer , Hauke Schramm

DOI: 10.1016/J.SPECOM.2005.09.009

关键词:

摘要: Abstract Boosting algorithms have been successfully used to improve performance in a variety of classification tasks. Here, we suggest an approach apply popular boosting algorithm (called “AdaBoost.M2”) Hidden Markov Model based speech recognizers, at the level utterances. In recognition tasks show that significantly improves best test error rates obtained with standard maximum likelihood training. addition, results several isolated word decoding experiments may also provide further gains over discriminative training, when both training techniques are combined. our this holds comparing final classifiers similar number parameters and evaluating conditions lexical acoustic mismatch conditions. Moreover, present extension large vocabulary continuous recognition, allowing online without processing N-best lists or lattices. This is achieved by using for combining different models decoding. particular, introduce weighted summation extended set alternative pronunciation representing boosted baseline model. way, arbitrarily long utterances can be recognized ensemble single pass framework. Evaluation presented on two tasks: real-life spontaneous dictation task 60k Switchboard.

参考文章(33)
Peter Beyerlein, Carsten Meyer, Hauke Schramm, Xavier L. Aubert, Matthew Harris, Investigations on conversational speech recognition. conference of the international speech communication association. pp. 499- 502 ,(2001)
Reinhard Blasig, Xavier L. Aubert, Combined acoustic and linguistic look-ahead for one-pass time-synchronous decoding. conference of the international speech communication association. pp. 802- 805 ,(2000)
Peter Beyerlein, Carsten Meyer, Towards Large Margin Speech Recognizers by Boosting and Discriminative Training international conference on machine learning. pp. 419- 426 ,(2002)
Gunnar Rätsch, Robust multi-class boosting. conference of the international speech communication association. ,(2003)
Bernhard Rueber, Obtaining confidence measures from sentence probabilities. conference of the international speech communication association. ,(1997)
Georg Rose, Carsten Meyer, Rival training: efficient use of data in discriminative training. conference of the international speech communication association. pp. 632- 635 ,(2000)
D Povey, PC Woodland, Large scale discriminative training for speech recognition Proc. ITW ASR, ISCA, 2000. ,(2000)
G. Zweig, M. Padmanabhan, Boosting Gaussian mixtures in an LVCSR system international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1527- 1530 ,(2000) , 10.1109/ICASSP.2000.861945
R. Schlüter, B. Müller, H. Ney, F. Wessel, INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING ,(2007)
G. Cook, T. Robinson, Boosting the performance of connectionist large vocabulary speech recognition international conference on spoken language processing. ,vol. 3, pp. 1305- 1308 ,(1996) , 10.1109/ICSLP.1996.607852