Large scale discriminative training for speech recognition

作者: D Povey , PC Woodland

DOI:

关键词:

摘要: This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of vocabulary speech recognition systems Gaussian mixture hidden Markov models (HMMs). The concentrates maximum mutual information estimation (MMIE) criterion which has been used to train HMM conversational telephone transcription using up 265 hours data. These experiments represent largest-scale application techniques authors are aware, have led significant reductions in word error rate both triphone quinphone HMMs compared our best trained likelihood estimation. MMIE latticebased implementation used; ensuring improved generalisation; interactions with adaptation all discussed. Furthermore several variations scheme introduced aim reducing over-training.

参考文章(27)
G Evermann, PC Woodland, Posterior probability decoding, confidence estimation and system combination NIST: National Institute of Standards and Technology. ,(2000)
A Tuerk, TR Niesler, GL Moore, T Hain, Ewd Whittaker, D Povey, PC Woodland, The 1998 HTK broadcast news transcription system: development and results DARPA. ,(1999)
Andreas Stolcke, Lidia Mangu, Eric Brill, Finding consensus among words : Lattice-based word error minimization conference of the international speech communication association. ,(1999)
B. Merialdo, Phonetic recognition using hidden Markov models and maximum mutual information training international conference on acoustics speech and signal processing. pp. 111- 114 ,(1988) , 10.1109/ICASSP.1988.196524
L. Bahl, P. Brown, P. de Souza, R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 49- 52 ,(1986) , 10.1109/ICASSP.1986.1169179
M.J.F. Gales, Maximum likelihood linear transformations for HMM-based speech recognition Computer Speech & Language. ,vol. 12, pp. 75- 98 ,(1998) , 10.1006/CSLA.1998.0043
V Valtchev, J.J Odell, P.C Woodland, S.J Young, MMIE training of large vocabulary recognition systems Speech Communication. ,vol. 22, pp. 303- 314 ,(1997) , 10.1016/S0167-6393(97)00029-0
M.J.F. Gales, P.C. Woodland, Mean and variance adaptation within the MLLR framework Computer Speech & Language. ,vol. 10, pp. 249- 264 ,(1996) , 10.1006/CSLA.1996.0013