作者: Xiaolong Li , Yunxin Zhao
DOI:
关键词:
摘要: In this dissertation, a real-time decoding engine for speaker-independent large vocabulary continuous speech recognition (LVCSR) is presented. An overview first given covering the state-of-the-art algorithms LVCSR. Since accuracy, speed, and memory cost are three indispensable correlated performance measurements practical system, all aspects carefully considered, with main innovations in fast memory-efficient algorithms. For crossword triphone based Hidden Markov Model (HMM) used developed which has been proved to generate significantly higher accuracy than within-word HMM. With use of model, search space dramatically increases compared system using model. Crossword Language lookahead fan-out arc tying make as compact possible. Five heuristic pruning methods two techniques also exploited reduce little or no loss accuracy. speed cost, novel algorithm, Order-Preserving Context Pre-computing (OPCP) proposed (LM) lookup, resulting significant improvement both overall time without any decrease OPCP integration previously methods: Minimum Perfect Hashing (MPH) (LMCP). By reducing hashing operations through order-preserving access LM scores, cuts down lookup effectively. meantime, reduces because reduced size keys need only last word index each N-gram storage. Experimental results reported on LVCSR tasks (Wall Street Journal 20K Switchboard 33K) sizes trigram LMs (small, medium, large). comparison MPH LMCP methods, from about 30∼80% total 8%∼14%, Except small LM, storage was same less original storage, much methods. The savings by became more pronounced increase size. By method other optimizations mentioned above, our one-pass engine, named TigerEngine, reached Wall 33K, platform Dell workstation one 3.2 GHz Xeon CPU.