Connectionist speaker normalization and adaptation.

作者: Victor Abrash , Michael Cohen , Horacio Franco , Ananth Sankar

DOI:

关键词:

摘要: In a speaker-independent, large-vocabulary continuous speech recognition systems, accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. this paper, we explore supervised adaptation normalization in the MLP component of hybrid hidden Markov model/ multilayer perceptron version SRI's DECIPHERTM system. Normalization is implemented through an additional transformation network that preprocesses cepstral input MLP. Adaptation accomplished incremental retraining weights on data. Our approach combines both single, consistent manner, works with limited data, text-independent. We show significant improvement accuracy.

参考文章(8)
David E. Rumelhart, Victor Abrash, Michael Cohen, Horacio Franco, Nelson Morgan, Hybrid neural network/hidden Markov model continuous-speech recognition. conference of the international speech communication association. ,(1992)
V. Digalakis, L. Neumeyer, Speaker adaptation using combined transformation and Bayesian methods international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 680- 683 ,(1995) , 10.1109/ICASSP.1995.479785
Francis Kubala, Roni Rosenfeld, Bob Roth, Mitch Weintraub, Jerome Bellegarda, Jordan Cohen, David Pallett, Doug Paul, Mike Phillips, Raja Rajasekaran, Fred Richardson, Michael Riley, The hub and spoke paradigm for CSR evaluation Proceedings of the workshop on Human Language Technology - HLT '94. pp. 37- 42 ,(1994) , 10.3115/1075812.1075822
R.L. Watrous, Speaker normalization and adaptation using second-order connectionist networks IEEE Transactions on Neural Networks. ,vol. 4, pp. 21- 30 ,(1993) , 10.1109/72.182692
A. Sankar, Chin-Hui Lee, Stochastic matching for robust speech recognition IEEE Signal Processing Letters. ,vol. 1, pp. 124- 125 ,(1994) , 10.1109/97.311815
V.V. Digalakis, D. Rtischev, L.G. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures IEEE Transactions on Speech and Audio Processing. ,vol. 3, pp. 357- 366 ,(1995) , 10.1109/89.466659
V. Digalakis, H. Murveit, Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 537- 540 ,(1994) , 10.1109/ICASSP.1994.389212
S. Renals, N. Morgan, M. Cohen, H. Franco, Connectionist probability estimation in the DECIPHER speech recognition system international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 601- 604 ,(1992) , 10.1109/ICASSP.1992.225837