Reducing computational complexity and response latency through the detection of contentless frames

作者: R.A. Sukkar , S.M. Herman , A.R. Setlur , C.D. Mitchell

DOI: 10.1109/ICASSP.2000.860218

关键词: VocabularyVector quantizationLatency (engineering)Speech recognitionSpeech codingDecoding methodsComputational complexity theoryClassifier (UML)Computer scienceSilence

摘要: In this paper, we present a method that manipulates the decoding network to reduce both computational complexity and response latency while maintaining high ASR accuracy. The employs TSVQ (tree structured vector quantization) classifier reliably discriminates between silence non-silence frames. Reductions in are achieved through three techniques: 1) skipping, 2) silence-based pruning of dynamic programming network, 3) early decision. Experimental results on connected digit task large vocabulary company name show proposed can by more than 82%. Furthermore, complexity, measured CPU seconds, was reduced 13.6% 6.7% recognition accuracy baseline system.

参考文章(7)
Stefan Ortmanns, Wu Chou, Wolfgang Reichl, An efficient decoding method for real time speech recognition. conference of the international speech communication association. ,(1999)
Biing-Hwang Juang, Wu Chou, C.-E. Lee, Minimum error rate training of inter-word context dependent acoustic model units in speech recognition. conference of the international speech communication association. ,(1994)
Anand R. Setlur, Rafid A. Sukkar, Recognition-based word counting for reliable barge-in and early endpoint detection in continuous speech recognition. conference of the international speech communication association. ,(1998)
E. Burhke, Wu Chou, Qiru Zhou, A wave decoder for continuous speech recognition Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96. ,vol. 4, pp. 2135- 2138 ,(1996) , 10.1109/ICSLP.1996.607225
S.M. Herman, R.A. Sukkar, Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition ieee automatic speech recognition and understanding workshop. pp. 331- 338 ,(1997) , 10.1109/ASRU.1997.659108
S. Ortmanns, A. Eiden, H. Ney, N. Coenen, Look-ahead techniques for fast beam search international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1783- 1786 ,(1997) , 10.1109/ICASSP.1997.598876
E. Bocchieri, Vector quantization for the efficient computation of continuous density likelihoods IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 692- 695 ,(1993) , 10.1109/ICASSP.1993.319405