Investigations on inter-speaker variability in the feature space

作者: R. Haeb-Umbach

DOI: 10.1109/ICASSP.1999.758146

关键词:

摘要: We apply Fisher variate analysis to measure the effectiveness of speaker normalization techniques. A trace criterion, which measures ratio variations due different phonemes compared speakers, serves as a first assessment feature set without need for recognition experiments. By using this and by experiments we demonstrate that cepstral mean also has effect, in addition well-known channel effect. Similarly vocal tract (VTN) is shown remove inter-speaker variability. For VTN show on per sentence basis performs better than basis. Recognition results are given Wall Street Journal Hub-4 databases.

参考文章(8)
Hynek Hermansky, Perceptual linear predictive (PLP) analysis of speech Journal of the Acoustical Society of America. ,vol. 87, pp. 1738- 1752 ,(1990) , 10.1121/1.399423
S. Wegmann, D. McAllaster, J. Orloff, B. Peskin, Speaker normalization on conversational telephone speech international conference on acoustics speech and signal processing. ,vol. 1, pp. 339- 341 ,(1996) , 10.1109/ICASSP.1996.541101
Li Lee, R.C. Rose, Speaker normalization using efficient frequency warping procedures international conference on acoustics speech and signal processing. ,vol. 1, pp. 353- 356 ,(1996) , 10.1109/ICASSP.1996.541105
R. Haeb-Umbach, D. Geller, H. Ney, Improvements in connected digit recognition using linear discriminant analysis and mixture densities IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 239- 242 ,(1993) , 10.1109/ICASSP.1993.319279
H. Hermansky, N. Morgan, RASTA processing of speech IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 578- 589 ,(1994) , 10.1109/89.326616
Y. Chen, Cepstral domain talker stress compensation for robust speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 36, pp. 433- 439 ,(1988) , 10.1109/29.1547
L. Welling, R. Haeb-Umbach, X. Zubert, N. Haberland, A study on speaker normalization using vocal tract normalization and speaker adaptive training international conference on acoustics speech and signal processing. ,vol. 2, pp. 797- 800 ,(1998) , 10.1109/ICASSP.1998.675385
Peter E. Hart, Richard O. Duda, Pattern classification and scene analysis A Wiley-Interscience Publication. ,(1973)