Speaker normalization for speech recognition

作者: X. Huang

DOI: 10.1109/ICASSP.1992.225871

关键词: Computer scienceNormalization (statistics)Speech recognitionSpeech synthesisSpeaker recognitionNonlinear systemArtificial intelligenceWord error rateLoudspeakerArtificial neural networkTraining setPattern recognition

摘要: A codeword-dependent neural network (CDNN) is presented for the study of speaker adaptation. The CDNN used as a nonlinear mapping function to transform speech data between two speakers. characterized by number important properties. First, assembly functions enhances overall quality. Second, multiple input vectors are simultaneously in transformation. This not only makes full use dynamic information but also alleviates possible errors supervision data. Finally, derived from training data, with quality dependent on available amount Based speaker-dependent models, performance evaluation showed that normalization significantly reduced error rate 41.9% 5.0%. >

参考文章(21)
Marco Ferretti, Stefano Scarci, Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters. conference of the international speech communication association. pp. 2154- 2156 ,(1989)
X.D. Huang, K.F. Lee, A. Waibel, Connectionist speaker normalization and its applications to speech recognition Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop. pp. 357- 366 ,(1991) , 10.1109/NNSP.1991.239506
R. Schwartz, Yen-Lu Chow, F. Kubala, Rapid speaker adaptation using a probabilistic spectral mapping international conference on acoustics, speech, and signal processing. ,vol. 12, pp. 633- 636 ,(1987) , 10.1109/ICASSP.1987.1169575
C. Montacie, K. Choukri, G. Chollet, Speech recognition using temporal decomposition and multi-layer feed-forward automata international conference on acoustics, speech, and signal processing. pp. 409- 412 ,(1989) , 10.1109/ICASSP.1989.266452
P. Brown, Chin-Hui Lee, J. Spohrer, Bayesian adaptation in speech recognition international conference on acoustics, speech, and signal processing. ,vol. 8, pp. 761- 764 ,(1983) , 10.1109/ICASSP.1983.1172084
D. S. Pallett, J. G. Fiscus, J. S. Garofolo, DARPA resource management benchmark test results June 1990 human language technology. pp. 298- 305 ,(1990) , 10.3115/116580.116683
L. Rabiner, S. Levinson, Isolated and Connected Word Recognition--Theory and Selected Applications IEEE Transactions on Communications. ,vol. 29, pp. 621- 659 ,(1981) , 10.1109/TCOM.1981.1095031
S. Tamura, A. Waibel, Noise reduction using connectionist models international conference on acoustics speech and signal processing. pp. 553- 556 ,(1988) , 10.1109/ICASSP.1988.196643
K. Shikano, Kai-Fu Lee, R. Reddy, Speaker adaptation through vector quantization international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 2643- 2646 ,(1986) , 10.1109/ICASSP.1986.1168676
X. Huang, K.F. Lee, On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 1, pp. 150- 157 ,(1993) , 10.1109/89.222875