Study of time and frequency variability in pathological speech and error reduction methods for automatic speech recognition.

作者: Oscar Saz , Eduardo Lleida , Antonio Miguel , Luis Buera , Alfonso Ortega

DOI:

关键词:

摘要: In this work, we study the variations in time and frequency domains inside a Spanish language corpus of speakers with nonpathological pathological speech. We show how speech has greater variability duration words than non-pathological speech, while domain that vowels confusability increases by 18%. The baseline experiments Automatic Speech Recognition (ASR) demonstrate causes loss performance ASR systems. To reduce impact use recent Vocal Tract Length Normalization (VTLN) system: MATE (augMented stAte space acousTic modEl), as way improving systems when dealing who suffer any kind pathology. Experiments 17.04% 11.19% WER reduction using respectively.

参考文章(7)
Pam Enderby, Mark S. Hawley, Phil D. Green, James Carmichael, Athanassios Hatzis, Mark Parker, Automatic speech recognition with sparse training data for dysarthric speakers. conference of the international speech communication association. ,(2003)
Eduardo Lleida, Antonio Miguel, Luis Buera, Alfonso Ortega, Richard C. Rose, Augmented state space acoustic decoding for modeling local variability in speech. conference of the international speech communication association. pp. 3009- 3012 ,(2005)
Karen Croot, An acoustic analysis of vowel production across tasks in a case of non-fluent progressive aphasia. conference of the international speech communication association. ,(1998)
J.R. Deller, D. Hsu, L.J. Ferrier, On the use of hidden Markov modelling for recognition of Dysarthric speech Computer Methods and Programs in Biomedicine. ,vol. 35, pp. 125- 139 ,(1991) , 10.1016/0169-2607(91)90071-Z
Frederic L. Darley, Arnold E. Aronson, Joe R. Brown, Differential Diagnostic Patterns of Dysarthria Journal of Speech and Hearing Research. ,vol. 12, pp. 246- 269 ,(1969) , 10.1044/JSHR.1202.246
L. Lee, R. Rose, A frequency warping approach to speaker normalization IEEE Transactions on Speech and Audio Processing. ,vol. 6, pp. 49- 60 ,(1998) , 10.1109/89.650310
M. Pitz, H. Ney, Vocal tract normalization equals linear transformation in cepstral space IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 930- 944 ,(2005) , 10.1109/TSA.2005.848881