作者: X. Huang , A. Acero , F. Alleva , M. Hwang , L. Jiang
DOI: 10.1007/978-1-4613-1367-0_20
关键词: Dictation 、 Spoken language 、 Speaker recognition 、 Acoustic model 、 Speech recognition 、 Computer science 、 Word error rate 、 Speech technology 、 Usability 、 Commercial speech
摘要: In this chapter, we first review Sphinx-II, a large-vocabulary speaker-independent continuous speech recognition system developed at Carnegie Mellon University, summarizing the techniques that helped Sphinx-II achieve state-of-the-art performance. We then Whisper, here Microsoft Corporation, focusing on accuracy, efficiency and usability issues. These three issues are critical to success of commercial applications. Whisper has significantly improved its performance in these areas. It can be configured as spoken language front-end (telephony or desktop) dictation application.