An overview of decoding techniques for large vocabulary continuous speech recognition

作者: Xavier L. Aubert

DOI: 10.1006/CSLA.2001.0185

关键词:

摘要: Abstract A number of decoding strategies for large vocabulary continuous speech recognition (LVCSR) are examined from the viewpoint their search space representation. Different design solutions compared with respect to integration linguistic and acoustic constraints, as implied by m -gram language models (LM) cross-word (CW) phonetic contexts. This study is structured along two main axes: network expansion algorithm itself. The can be expanded statically or dynamically while proceed either time-synchronously asynchronously which leads distinct architectures. Three broad classes methods briefly reviewed: use weighted finite state transducers (WFST) static expansion, time-synchronous dynamic-expansion asynchronous stack decoding. Heuristic further reducing also considered. approaches some prospective views formulated regarding possible future avenues.

参考文章(54)
Volker Steinbiss, A search organization for large-vocabulary recognition based on n-best decoding. conference of the international speech communication association. ,(1991)
Volker Steinbiss, Bach-Hiep Tran, Hermann Ney, Improvements in beam search. conference of the international speech communication association. ,(1994)
SJ Young, NH Russell, Jhs Thornton, Token passing: a simple conceptual model for connected speech recognition systems University of Cambridge: Department of Engineering. ,(1989)
Hans Dolfing, Xavier L. Aubert, Christoph Neukirchen, Extending the generation of word graphs for a cross-word m-gram decoder. conference of the international speech communication association. pp. 302- 305 ,(2000)
Reinhard Blasig, Xavier L. Aubert, Combined acoustic and linguistic look-ahead for one-pass time-synchronous decoding. conference of the international speech communication association. pp. 802- 805 ,(2000)
Lori Lamel, Jean-Luc Gauvain, Fast decoding for indexation of broadcast data. conference of the international speech communication association. pp. 794- 797 ,(2000)
Michael Picheny, Miroslav Novak, Speed improvement of the time-asynchronous acoustic fast match. conference of the international speech communication association. ,(1999)
Stefan Ortmanns, Wu Chou, Wolfgang Reichl, An efficient decoding method for real time speech recognition. conference of the international speech communication association. ,(1999)
Detlef Koll, Jürgen Fritsch, Michael Finke, Alex Waibel, Modeling and efficient decoding of large vocabulary conversational speech. conference of the international speech communication association. ,(1999)
Steve Renals, Mike Hochberg, Anthony J. Robinson, Dan J. Kershaw, Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system. conference of the international speech communication association. ,(1994)