The 1998 HTK system for transcription of conversational telephone speech

作者： T. Hain , P.C. Woodland , T.R. Niesler , E.W.D. Whittaker

DOI: 10.1109/ICASSP.1999.758061

关键词: Word error rate 、 Transcription (software) 、 Telephony 、 Natural language 、 Vocal tract 、 Triphone 、 Natural language processing 、 Cepstrum 、 Artificial intelligence 、 Speech recognition 、 Computer science 、 Hidden Markov model 、 NIST

摘要: This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone as used in NIST Hub5E evaluation. Front-end and language modelling experiments conducted using various training test sets from both Switchboard Callhome English corpora are presented. Our complete includes reduced bandwidth analysis, side-based cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone quinphone hidden Markov models (HMMs) built speaker adaptive (SAT), maximum likelihood linear regression (MLLR) adaptation a confidence score based combination. A detailed description of together with experimental results each stage our multi-pass decoding scheme is The word error rate obtained almost 20% better than 1997 on development set.

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(7)

Reinhard Kneser, Hermann Ney, Improved Clustering Techniques for Class-Based Statistical Language Modelling conference of the international speech communication association. pp. 21- 23 ,(1993)

A Tuerk, PC Woodland, TR Niesler, T Hain, Ewd Whittaker, SE Johnson, The 1997 HTK broadcast news transcription system DARPA. ,(1998)

J.G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) ieee automatic speech recognition and understanding workshop. pp. 347- 354 ,(1997) , 10.1109/ASRU.1997.659110

M.J.F. Gales, P.C. Woodland, Mean and variance adaptation within the MLLR framework Computer Speech & Language. ,vol. 10, pp. 249- 264 ,(1996) , 10.1006/CSLA.1996.0013

D. Pye, P.C. Woodland, Experiments in speaker normalisation and adaptation for large vocabulary speech recognition international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1047- 1050 ,(1997) , 10.1109/ICASSP.1997.596120

T.R. Niesler, E.W.D. Whittaker, P.C. Woodland, Comparison of part-of-speech and automatically derived category-based language models for speech recognition international conference on acoustics speech and signal processing. ,vol. 1, pp. 177- 180 ,(1998) , 10.1109/ICASSP.1998.674396

E. Eide, H. Gish, A parametric approach to vocal tract length normalization international conference on acoustics speech and signal processing. ,vol. 1, pp. 346- 348 ,(1996) , 10.1109/ICASSP.1996.541103

The 1998 HTK system for transcription of conversational telephone speech

来源期刊

我的账户

The 1998 HTK system for transcription of conversational telephone speech

来源期刊

相似文章 10

我的账户