From Sphinx-II to Whisper — Making Speech Recognition Usable

作者： X. Huang , A. Acero , F. Alleva , M. Hwang , L. Jiang

关键词: Dictation 、 Spoken language 、 Speaker recognition 、 Acoustic model 、 Speech recognition 、 Computer science 、 Word error rate 、 Speech technology 、 Usability 、 Commercial speech

摘要: In this chapter, we first review Sphinx-II, a large-vocabulary speaker-independent continuous speech recognition system developed at Carnegie Mellon University, summarizing the techniques that helped Sphinx-II achieve state-of-the-art performance. We then Whisper, here Microsoft Corporation, focusing on accuracy, efficiency and usability issues. These three issues are critical to success of commercial applications. Whisper has significantly improved its performance in these areas. It can be configured as spoken language front-end (telephony or desktop) dictation application.

microsoft.com 本地加速

springer.com 本地加速

sci-hub.st HTML 下载加速

参考文章(34)

Hermann Ney, Modeling and search in continuous speech recognition. conference of the international speech communication association. ,(1993)

Mei-Yuh Hwang, Hsiao-Wuen Hon, Kai-Fu Lee, Modeling between-word coarticulation in continuous speech recognition. conference of the international speech communication association. pp. 1005- 1008 ,(1989)

Kai-Fu Lee, The conversational computer: an apple perspective. conference of the international speech communication association. ,(1993)

Renato De Mori, Pietro Laface, Speech Recognition and Understanding: Recent Advances, Trends, and Applications Springer-Verlag. ,(1997)

Fileno A. Alleva, Search Organization for Large Vocabulary Continuous Speech Recognition Springer, Berlin, Heidelberg. pp. 217- 222 ,(1992) , 10.1007/978-3-642-76626-8_25

Leonard R. Marino, Principles of computer design ,(1986)

Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)

Xuedong Huang, Minimizing speaker variation effects for speaker-independent speech recognition Proceedings of the workshop on Speech and Natural Language - HLT '91. pp. 191- 196 ,(1992) , 10.3115/1075527.1075569

Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-5, pp. 179- 190 ,(1983) , 10.1109/TPAMI.1983.4767370

10.

H. Ney, R. Haeb-Umbach, B.-H. Tran, M. Oerder, Improvements in beam search for 10000-word continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 9- 12 ,(1992) , 10.1109/ICASSP.1992.225985

From Sphinx-II to Whisper — Making Speech Recognition Usable

来源期刊

我的账户

From Sphinx-II to Whisper — Making Speech Recognition Usable

来源期刊

相似文章 10

我的账户