A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches

作者: Shinji Watanabe , Atsushi Nakamura

DOI: 10.1109/ICASSP.2008.4518602

关键词: Time evolutionArtificial intelligenceMaximum a posteriori estimationPattern recognitionAcoustic modelComputer scienceClassifier (UML)Regression analysis

摘要: Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models quickly and stably to time-variant characteristics due temporal changes of speaker, speaking style, noise source, etc. We proposed a novel incremental framework based on macroscopic time evolution system, which the by successively updating posterior distributions model parameters. In this paper, we provide unified interpretation proposal two major conventional approaches indirect via transformation parameters (e.g. maximum likelihood linear regression (MLLR)) direct classifier posteriori (MAP)). reveal analytically experimentally that involves both their combinatorial approaches, simultaneously possesses quick stable characteristics.

参考文章(13)
Chin-Hui Lee, Tatsuo Matsuoka, A study of on-line Bayesian adaptation for HMM-based speech recognition. conference of the international speech communication association. ,(1993)
Koichi Shinoda, Takao Watanabe, Speaker adaptation with autonomous control using tree structure. conference of the international speech communication association. ,(1995)
G. Zavaliagkos, R. Schwartz, J. Makhoul, Batch, incremental and instantaneous adaptation techniques for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 676- 679 ,(1995) , 10.1109/ICASSP.1995.479688
Jun-ichi Takahashi, Shigeki Sagayama, Vector-field-smoothed Bayesian learning for fast and incremental speaker/telephone-channel adaptation Computer Speech & Language. ,vol. 11, pp. 127- 146 ,(1997) , 10.1006/CSLA.1996.0025
Shinji Watanabe, Atsushi Nakamura, Incremental Adaptation Based on a Macroscopic Time Evolution System international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 769- 772 ,(2007) , 10.1109/ICASSP.2007.367026
J.-L. Gauvain, Chin-Hui Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 291- 298 ,(1994) , 10.1109/89.279278
Kai Yu, Mark J. F. Gales, Bayesian Adaptive Inference and Adaptive Training IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 15, pp. 1932- 1943 ,(2007) , 10.1109/TASL.2007.901300
V.V. Digalakis, D. Rtischev, L.G. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures IEEE Transactions on Speech and Audio Processing. ,vol. 3, pp. 357- 366 ,(1995) , 10.1109/89.466659
Christopher J Leggetter, Philip C Woodland, None, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Computer Speech & Language. ,vol. 9, pp. 171- 185 ,(1995) , 10.1006/CSLA.1995.0010
Qiang Huo, Chin-Hui Lee, On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate IEEE Transactions on Speech and Audio Processing. ,vol. 5, pp. 161- 172 ,(1997) , 10.1109/89.554778