From Sphinx-II to Whisper — Making Speech Recognition Usable

作者: X. Huang , A. Acero , F. Alleva , M. Hwang , L. Jiang

DOI: 10.1007/978-1-4613-1367-0_20

关键词: DictationSpoken languageSpeaker recognitionAcoustic modelSpeech recognitionComputer scienceWord error rateSpeech technologyUsabilityCommercial speech

摘要: In this chapter, we first review Sphinx-II, a large-vocabulary speaker-independent continuous speech recognition system developed at Carnegie Mellon University, summarizing the techniques that helped Sphinx-II achieve state-of-the-art performance. We then Whisper, here Microsoft Corporation, focusing on accuracy, efficiency and usability issues. These three issues are critical to success of commercial applications. Whisper has significantly improved its performance in these areas. It can be configured as spoken language front-end (telephony or desktop) dictation application.

参考文章(34)
Hermann Ney, Modeling and search in continuous speech recognition. conference of the international speech communication association. ,(1993)
Mei-Yuh Hwang, Hsiao-Wuen Hon, Kai-Fu Lee, Modeling between-word coarticulation in continuous speech recognition. conference of the international speech communication association. pp. 1005- 1008 ,(1989)
Kai-Fu Lee, The conversational computer: an apple perspective. conference of the international speech communication association. ,(1993)
Renato De Mori, Pietro Laface, Speech Recognition and Understanding: Recent Advances, Trends, and Applications Springer-Verlag. ,(1997)
Fileno A. Alleva, Search Organization for Large Vocabulary Continuous Speech Recognition Springer, Berlin, Heidelberg. pp. 217- 222 ,(1992) , 10.1007/978-3-642-76626-8_25
Leonard R. Marino, Principles of computer design ,(1986)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Xuedong Huang, Minimizing speaker variation effects for speaker-independent speech recognition Proceedings of the workshop on Speech and Natural Language - HLT '91. pp. 191- 196 ,(1992) , 10.3115/1075527.1075569
Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-5, pp. 179- 190 ,(1983) , 10.1109/TPAMI.1983.4767370
H. Ney, R. Haeb-Umbach, B.-H. Tran, M. Oerder, Improvements in beam search for 10000-word continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 9- 12 ,(1992) , 10.1109/ICASSP.1992.225985