Acoustic variability and automatic recognition of children's speech

作者: Matteo Gerosa , Diego Giuliani , Fabio Brugnara

DOI: 10.1016/J.SPECOM.2007.01.002

关键词: VocabularySpeech recognitionComplement (complexity)Speech processingComputer science

摘要: This paper presents several acoustic analyses carried out on read speech collected from Italian children aged 7 to 13 years and North American 5 17 years. These aimed at achieving a better understanding of spectral temporal changes in produced by various ages view the development automatic recognition applications. The results these confirm complement reported literature, showing that characteristics children's change with age variability decrease as increases. In fact, younger show substantially higher intra- inter-speaker respect older adults. We investigated use methods for speaker adaptive modeling cope improve performance children. proved be effective vocabulary about 11k words.

参考文章(49)
Fabio Brugnara, Matteo Gerosa, Diego Giuliani, Speaker adaptive acoustic modeling with mixture of adult and children's speech. conference of the international speech communication association. pp. 2193- 2196 ,(2005)
Haizhou Li, C. Santhosh Kumar, V. P. Mohandas, Multilingual Speech Recognition: A Unified Approach conference of the international speech communication association. pp. 3357- 3360 ,(2005)
Alexandros Potamianos, Sungbok Lee, Shrikanth S. Narayanan, Automatic speech recognition for children. conference of the international speech communication association. ,(1997)
Fabio Brugnara, Maurizio Omologo, Daniele Falavigna, Roberto Gretter, Diego Giuliani, Bianca Angelini, Speaker independent continuous speech recognition using an acoustic-phonetic Italian corpus. conference of the international speech communication association. ,(1994)
Fabio Brugnara, Roberto Gretter, Heinrich Niemann, Diego Giuliani, Marcello Federico, Bianca Angelini, Ulla Ackermann, Speedata: a prototype for multilingual spoken data-entry. conference of the international speech communication association. ,(1997)
Sadaoki Furui, Koji Iwano, Masanobu Nakamura, Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances. conference of the international speech communication association. pp. 3381- 3384 ,(2005)
Sudha Arunachalam, Elaine Andersen, Shrikanth S. Narayanan, Dani Byrd, Dylan Gould, Politeness and frustration language in child-machine interactions conference of the international speech communication association. pp. 2675- 2678 ,(2001)
Fabio Brugnara, Diego Giuliani, Marcello Federico, Mauro Cettolo, Issues in automatic transcription of historical audio data. conference of the international speech communication association. ,(2002)
Jack Mostow, Joseph E. Beck, Satanjeev Banerjee, Evaluating the effect of predicting oral reading miscues. conference of the international speech communication association. ,(2003)
Jing Zheng, H. Franco, Fuliang Weng, A. Sankar, H. Bratt, Word-level rate of speech modeling using rate-specific phones and pronunciations international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1775- 1778 ,(2000) , 10.1109/ICASSP.2000.862097