Self-learning speaker adaptation based on spectral bias source decomposition, using very short calibration speech

作者: Yunxin Zhao

DOI: 10.1121/1.426675

关键词: Computer scienceCalibration (statistics)Adaptation (computer science)PhoneDependency (UML)Speaker recognitionSpeaker diarisationNormalization (statistics)Speech recognitionVariation (linguistics)Acoustics

摘要: A speaker adaptation technique based on the separation of speech spectra variation sources is developed for improving speaker-independent continuous recognition. The include acoustic characteristics, and contextual dependency allophones. Statistical methods are formulated to normalize characteristics then adapt mixture Gaussian density phone models phonologic characteristics. Adaptation experiments using short calibration (5 sec./speaker) have shown substantial performance improvement over baseline recognition system.

参考文章(14)
John W. Upton, John E. Holmgren, Aaron E. Rosenberg, Speaker identification system using word recognition templates ,(1981)
Raj Reddy, Kai-Fu Lee, Large-vocabulary speaker-independent continuous speech recognition: the sphinx system Carnegie Mellon University. ,(1988)
S. Furui, Unsupervised speaker adaptation method based on hierarchical spectral clustering International Conference on Acoustics, Speech, and Signal Processing. pp. 286- 289 ,(1989) , 10.1109/ICASSP.1989.266421
S.J. Cox, J.S. Bridle, Simultaneous speaker normalisation and utterance labelling using Bayesian/neural net techniques international conference on acoustics, speech, and signal processing. pp. 161- 164 ,(1990) , 10.1109/ICASSP.1990.115563
Laurence Gillick, Method for representing word models for use in speech recognition Journal of the Acoustical Society of America. ,vol. 92, pp. 629- 629 ,(1989) , 10.1121/1.404106
Y. Zhao, H. Wakita, X. Zhuang, An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT database international conference on acoustics, speech, and signal processing. pp. 333- 336 ,(1991) , 10.1109/ICASSP.1991.150344
Hiroshi Matsumoto, Hisashi Wakita, Vowel normalization by frequency warped spectral matching Speech Communication. ,vol. 5, pp. 239- 251 ,(1986) , 10.1016/0167-6393(86)90011-7
Lalit R. Bahl, Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system Journal of the Acoustical Society of America. ,vol. 93, pp. 3541- 3541 ,(1990) , 10.1121/1.405346
W.A. Rozzi, R.M. Stern, Speaker adaptation in continuous speech recognition via estimation of correlated mean vectors international conference on acoustics, speech, and signal processing. pp. 865- 868 ,(1991) , 10.1109/ICASSP.1991.150475
C.-H. Lee, C.-H. Lin, B.-H. Juang, A study on speaker adaptation of continuous density HMM parameters international conference on acoustics, speech, and signal processing. pp. 145- 148 ,(1990) , 10.1109/ICASSP.1990.115559