A Vector Space Approach to Environment Modeling for Robust Speech Recognition

作者: Yu Tsao , Chin-Hui Lee

DOI:

关键词: Reduction (complexity)Speech recognitionAdaptation (computer science)GaussianComputer sciencePattern recognitionVector spaceInterpolationHidden Markov modelTransformation (function)Artificial intelligenceApproximation error

摘要: We propose a vector space approach to characterizing environments for robust speech recognition. represent given environment by super-vector formed concatenating all the mean vectors of Gaussian mixture components state observation densities hidden Markov models trained in particular environment. New super-vectors can now be obtained either an interpolation method with collection from many real or simulated transformation performed on anchor specific environment, such as clean condition. At 5dB signal-to-noise (SNR) level, both interpolation- and transformation-based approaches achieve significant error rate reduction close 47% baseline system cepstral subtraction (CMS) only two adaptation utterances. When incorporating N-best information perform unsupervised at SNR same utterances, we relative about 40%, that achieved supervised mode. Index Terms: acoustic modeling,

参考文章(10)
Nicholas Metropolis, S. Ulam, The Monte Carlo Method Journal of the American Statistical Association. ,vol. 44, pp. 335- 341 ,(1949) , 10.1080/01621459.1949.10483310
A. Sankar, Chin-Hui Lee, A maximum-likelihood approach to stochastic matching for robust speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 4, pp. 190- 202 ,(1996) , 10.1109/89.496215
A. Acero, R.M. Stern, Environmental robustness in automatic speech recognition international conference on acoustics, speech, and signal processing. pp. 849- 852 ,(1990) , 10.1109/ICASSP.1990.115971
Christopher J Leggetter, Philip C Woodland, None, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Computer Speech & Language. ,vol. 9, pp. 171- 185 ,(1995) , 10.1006/CSLA.1995.0010
Ian T. Jolliffe, Principal Component Analysis ,(1986)
P. Nguyen, P. Gelin, J.-C. Junqua, J.-T. Chien, N-best based supervised and unsupervised adaptation for native and non-native speakers in cars international conference on acoustics speech and signal processing. ,vol. 1, pp. 173- 176 ,(1999) , 10.1109/ICASSP.1999.758090
Hans-Günter Hirsch, David Pearce, THE AURORA EXPERIMENTAL FRAMEWORK FOR THE PERFORMANCE EVALUATION OF SPEECH RECOGNITION SYSTEMS UNDER NOISY CONDITIONS conference of the international speech communication association. ,vol. 4, pp. 29- 32 ,(2000)
R. Leonard, A database for speaker-independent digit recognition international conference on acoustics, speech, and signal processing. ,vol. 9, pp. 328- 331 ,(1984) , 10.1109/ICASSP.1984.1172716
Nancy L. Dahlgren, Jonathan G. Fiscus, L F. Lamel, D S. Pallett, John S. Garofolo, W M. Fisher, Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST NIST Interagency/Internal Report (NISTIR) - 4930. ,(1993)