Real-time speaker-independent large vocabulary continuous speech recognition

作者: Xiaolong Li , Yunxin Zhao

DOI:

关键词:

摘要: In this dissertation, a real-time decoding engine for speaker-independent large vocabulary continuous speech recognition (LVCSR) is presented. An overview first given covering the state-of-the-art algorithms LVCSR. Since accuracy, speed, and memory cost are three indispensable correlated performance measurements practical system, all aspects carefully considered, with main innovations in fast memory-efficient algorithms. For crossword triphone based Hidden Markov Model (HMM) used developed which has been proved to generate significantly higher accuracy than within-word HMM. With use of model, search space dramatically increases compared system using model. Crossword Language lookahead fan-out arc tying make as compact possible. Five heuristic pruning methods two techniques also exploited reduce little or no loss accuracy. speed cost, novel algorithm, Order-Preserving Context Pre-computing (OPCP) proposed (LM) lookup, resulting significant improvement both overall time without any decrease OPCP integration previously methods: Minimum Perfect Hashing (MPH) (LMCP). By reducing hashing operations through order-preserving access LM scores, cuts down lookup effectively. meantime, reduces because reduced size keys need only last word index each N-gram storage. Experimental results reported on LVCSR tasks (Wall Street Journal 20K Switchboard 33K) sizes trigram LMs (small, medium, large). comparison MPH LMCP methods, from about 30∼80% total 8%∼14%, Except small LM, storage was same less original storage, much methods. The savings by became more pronounced increase size. By method other optimizations mentioned above, our one-pass engine, named TigerEngine, reached Wall 33K, platform Dell workstation one 3.2 GHz Xeon CPU.

参考文章(79)
Volker Steinbiss, Bach-Hiep Tran, Hermann Ney, Improvements in beam search. conference of the international speech communication association. ,(1994)
SJ Young, NH Russell, Jhs Thornton, Token passing: a simple conceptual model for connected speech recognition systems University of Cambridge: Department of Engineering. ,(1989)
Frederick Jelinek, Up from trigrams! - the struggle for improved language models. conference of the international speech communication association. ,(1991)
Alex Acero, Xuedong Huang, Hsiao-Wuen Hon, Spoken Language Processing Prentice-Hall. pp. 1008- ,(2001)
Yunxin Zhao, Xiao Zhang, Minimum perfect hashing for fast n-gram language model lookup. conference of the international speech communication association. ,(2002)
Mei-Yuh Hwang, Hsiao-Wuen Hon, Kai-Fu Lee, Modeling between-word coarticulation in continuous speech recognition. conference of the international speech communication association. pp. 1005- 1008 ,(1989)
Stefan Ortmanns, Wu Chou, Wolfgang Reichl, An efficient decoding method for real time speech recognition. conference of the international speech communication association. ,(1999)
Achim Sixtus, Hermann Ney, Across-word phoneme models for large vocabulary continuous speech recognition Publikationsserver der RWTH Aachen University. ,(2003)