Bayesian learning of Gaussian mixture densities for hidden Markov models

作者: Jean-Luc Gauvain , Chin-Hui Lee

DOI: 10.3115/112405.112457

关键词:

摘要: An investigation into the use of Bayesian learning parameters a multivariate Gaussian mixture density has been carried out. In continuous hidden Markov model (CDHMM) framework, serves as unified approach for parameter smoothing, speaker adaptation, clustering, and corrective training. The goal this study is to enhance robustness in CDHMM-based speech recognition system so improve performance. Our incorporate prior knowledge CDHMM training process form densities HMM parameters. theoretical basis procedure presented preliminary results applying clustering are given.Performance improvements were observed on tests using DARPA RM task. For under supervised mode with 2 minutes speaker-specific data, 31% reduction word error rate was obtained compared speaker-independent results. Using Baysesian smoothing sex-dependent modeling, 21% FEB91 test.

参考文章(17)
Marco Ferretti, Stefano Scarci, Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters. conference of the international speech communication association. pp. 2154- 2156 ,(1989)
Raj Reddy, Kai-Fu Lee, Large-vocabulary speaker-independent continuous speech recognition: the sphinx system Carnegie Mellon University. ,(1988)
F. Jelinek, Interpolated estimation of Markov source parameters from sparse data Proc. Workshop on Pattern Recognition in Practice, 1980. pp. 381- 397 ,(1980)
R. Zelinski, F. Class, A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens international conference on acoustics, speech, and signal processing. ,vol. 8, pp. 1053- 1056 ,(1983) , 10.1109/ICASSP.1983.1171906
P. Brown, Chin-Hui Lee, J. Spohrer, Bayesian adaptation in speech recognition international conference on acoustics, speech, and signal processing. ,vol. 8, pp. 761- 764 ,(1983) , 10.1109/ICASSP.1983.1172084
Morris Herman DeGroot, Optimal Statistical Decisions ,(1970)
R. Pieraccini, C. H. Lee, E. Giachin, L. R. Rabiner, Implementation aspects of large vocabulary recognition based on intraword and interword phonetic units human language technology. pp. 311- 318 ,(1990) , 10.3115/116580.116685
C.-H. Lee, E. Giachin, L. R. Rabiner, R. Pieraccini, A. E. Rosenberg, Improved acoustic modeling for continuous speech recognition human language technology. pp. 319- 326 ,(1990) , 10.3115/116580.116686
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Xuedong Huang, Fil Alleva, Satoru Hayamizu, Hsiao-Wuen Hon, Mei-Yuh Hwang, Kai-Fu Lee, Improved hidden Markov modeling for speaker-independent continuous speech recognition human language technology. pp. 327- 331 ,(1990) , 10.3115/116580.116687