Improvements of search error risk minimization in viterbi beam search for speech recognition

作者: Takaaki Hori , Shinji Watanabe , Atsushi Nakamura

DOI:

关键词: Machine learningWord error rateSpeech recognitionMinificationViterbi beam searchHeuristicPruning (decision trees)Computer scienceBeam searchArtificial intelligenceHeuristic (computer science)Function (mathematics)Vocabulary

摘要: Abstract This paper describes improvements in a search error risk min-imization approach to fast beam for speech recognition.In our previous work, we proposed this reducesearch errors by optimizing the pruning criterion. While con-ventional methods use heuristic criteria prune hypotheses,our method employs function that makesa more precise decision using rich features extracted from eachhypothesis. The parameters of can be estimatedto minimize loss based on risk. Inthis paper, improve introducing modifiedloss function, arc-averaged risk, which potentially has highercorrelation with actual rate than original one. We alsoinvestigate various combinations features. Experimental re-sults show further reduction over originalmethod is obtained 100K-word vocabulary lecture speechtranscription task.IndexTerms: recognition, search, pruning, searcherror, WFST

参考文章(10)
Hitoshi Isahara, Sadaoki Furui, Kikuo Maekawa, Hanae Koiso, Spontaneous Speech Corpus of Japanese language resources and evaluation. ,(2000)
Xavier L. Aubert, An overview of decoding techniques for large vocabulary continuous speech recognition Computer Speech & Language. ,vol. 16, pp. 89- 114 ,(2002) , 10.1006/CSLA.2001.0185
H. Ney, R. Haeb-Umbach, B.-H. Tran, M. Oerder, Improvements in beam search for 10000-word continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 9- 12 ,(1992) , 10.1109/ICASSP.1992.225985
Hal Daumé, Daniel Marcu, Learning as search optimization Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 169- 176 ,(2005) , 10.1145/1102351.1102373
Takaaki Hori, Shinji Watanabe, Atsushi Nakamura, Search error risk minimization in Viterbi beam search for speech recognition 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 4934- 4937 ,(2010) , 10.1109/ICASSP.2010.5495099
Yuehua Xu, Alan Fern, On learning linear ranking functions for beam search international conference on machine learning. pp. 1047- 1054 ,(2007) , 10.1145/1273496.1273628
S. Ortmanns, A. Eiden, H. Ney, Improved lexical tree search for large vocabulary speech recognition international conference on acoustics speech and signal processing. ,vol. 2, pp. 817- 820 ,(1998) , 10.1109/ICASSP.1998.675390
Bruce T. Lowerre, The HARPY speech recognition system ,(1976)
Takaaki Hori, Chiori Hori, Yasuhiro Minami, Atsushi Nakamura, Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 15, pp. 1352- 1365 ,(2007) , 10.1109/TASL.2006.889790
Erik McDermott, Timothy J. Hazen, Jonathan Le Roux, Atsushi Nakamura, Shigeru Katagiri, Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 15, pp. 203- 223 ,(2007) , 10.1109/TASL.2006.876778