作者: T. Hain , P.C. Woodland , T.R. Niesler , E.W.D. Whittaker
DOI: 10.1109/ICASSP.1999.758061
关键词: Word error rate 、 Transcription (software) 、 Telephony 、 Natural language 、 Vocal tract 、 Triphone 、 Natural language processing 、 Cepstrum 、 Artificial intelligence 、 Speech recognition 、 Computer science 、 Hidden Markov model 、 NIST
摘要: This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone as used in NIST Hub5E evaluation. Front-end and language modelling experiments conducted using various training test sets from both Switchboard Callhome English corpora are presented. Our complete includes reduced bandwidth analysis, side-based cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone quinphone hidden Markov models (HMMs) built speaker adaptive (SAT), maximum likelihood linear regression (MLLR) adaptation a confidence score based combination. A detailed description of together with experimental results each stage our multi-pass decoding scheme is The word error rate obtained almost 20% better than 1997 on development set.