On the development of matched and mismatched Italian children's speech recognition systems.

作者: Piero Cosi

DOI:

关键词:

摘要: While at least read speech corpora are available for Italian children’s research, there exist many languages which completely lack corpora. We propose that learning statistical mappings between the adult and child acoustic space using existing adult/children may provide a future direction generating models such data deficient languages. In this work recent advances in development of SONIC recognition system will be described. This work, completing previous one developed past, was conducted with specific goals integrating newly trained into version Colorado Literacy Tutor platform. Specifically, research complete training test set FBK (ex ITC-irst) Children’s Speech Corpus (ChildIt). Using University LVSR system, we demonstrate phonetic error rate 12,0% incorporates Vocal Tract Length Normalization (VTLN), Speaker-Adaptive Trained models, as well unsupervised Structural MAP Linear Regression (SMAPLR).

参考文章(14)
John-Paul Hosom, Khaldoun Shobaki, Ronald A. Cole, The OGI kids² speech corpus and recognizers. conference of the international speech communication association. pp. 258- 261 ,(2000)
Bryan L. Pellom, Piero Cosi, Italian children's speech recognition for advanced interactive literacy tutors. conference of the international speech communication association. pp. 2201- 2204 ,(2005)
Ronald Rosenfeld, Philip Clarkson, Statistical Language Modeling using the CMU-Cambridge Toolkit conference of the international speech communication association. ,(1997)
Matteo Gerosa, Diego Giuliani, Fabio Brugnara, Acoustic variability and automatic recognition of children's speech Speech Communication. ,vol. 49, pp. 847- 860 ,(2007) , 10.1016/J.SPECOM.2007.01.002
Olivier Siohan, Tor André Myrvoll, Chin-Hui Lee, Structural maximum a posteriori linear regression for fast HMM adaptation Computer Speech & Language. ,vol. 16, pp. 5- 24 ,(2002) , 10.1006/CSLA.2001.0181
Umit H. Yapanel, John H. L. Hansen, A new perspective on feature extraction for robust in-vehicle speech recognition conference of the international speech communication association. ,(2003)
D. Giuliani, M. Gerosa, Investigating recognition of children's speech international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 137- 140 ,(2003) , 10.1109/ICASSP.2003.1202313
A. Hagen, B. Pellom, R. Cole, Children's speech recognition with application to interactive books and tutors ieee automatic speech recognition and understanding workshop. pp. 186- 191 ,(2003) , 10.1109/ASRU.2003.1318426
B. Pellom, K. Hacioglu, Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).. ,vol. 1, pp. 4- 7 ,(2003) , 10.1109/ICASSP.2003.1198702
L. Welling, S. Kanthak, H. Ney, Improved methods for vocal tract normalization international conference on acoustics speech and signal processing. ,vol. 2, pp. 761- 764 ,(1999) , 10.1109/ICASSP.1999.759780