The 1998 HTK system for transcription of conversational telephone speech

作者: T. Hain , P.C. Woodland , T.R. Niesler , E.W.D. Whittaker

DOI: 10.1109/ICASSP.1999.758061

关键词: Word error rateTranscription (software)TelephonyNatural languageVocal tractTriphoneNatural language processingCepstrumArtificial intelligenceSpeech recognitionComputer scienceHidden Markov modelNIST

摘要: This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone as used in NIST Hub5E evaluation. Front-end and language modelling experiments conducted using various training test sets from both Switchboard Callhome English corpora are presented. Our complete includes reduced bandwidth analysis, side-based cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone quinphone hidden Markov models (HMMs) built speaker adaptive (SAT), maximum likelihood linear regression (MLLR) adaptation a confidence score based combination. A detailed description of together with experimental results each stage our multi-pass decoding scheme is The word error rate obtained almost 20% better than 1997 on development set.

参考文章(7)
Reinhard Kneser, Hermann Ney, Improved Clustering Techniques for Class-Based Statistical Language Modelling conference of the international speech communication association. pp. 21- 23 ,(1993)
A Tuerk, PC Woodland, TR Niesler, T Hain, Ewd Whittaker, SE Johnson, The 1997 HTK broadcast news transcription system DARPA. ,(1998)
J.G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) ieee automatic speech recognition and understanding workshop. pp. 347- 354 ,(1997) , 10.1109/ASRU.1997.659110
M.J.F. Gales, P.C. Woodland, Mean and variance adaptation within the MLLR framework Computer Speech & Language. ,vol. 10, pp. 249- 264 ,(1996) , 10.1006/CSLA.1996.0013
D. Pye, P.C. Woodland, Experiments in speaker normalisation and adaptation for large vocabulary speech recognition international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1047- 1050 ,(1997) , 10.1109/ICASSP.1997.596120
T.R. Niesler, E.W.D. Whittaker, P.C. Woodland, Comparison of part-of-speech and automatically derived category-based language models for speech recognition international conference on acoustics speech and signal processing. ,vol. 1, pp. 177- 180 ,(1998) , 10.1109/ICASSP.1998.674396
E. Eide, H. Gish, A parametric approach to vocal tract length normalization international conference on acoustics speech and signal processing. ,vol. 1, pp. 346- 348 ,(1996) , 10.1109/ICASSP.1996.541103