Model space size scaling for speaker adaptation

作者: Mats Blomberg

DOI:

关键词:

摘要: In the current work, instantaneous adaptation in speech recognition is performedby estimating speaker properties, which modify original trained acousticmodels. We introduce a new property, size of model space, isincluded to previously used features, VTLN and spectral slope. These arejointly estimated for each test utterance. The feature has shown be effectivefor children’s using adult-trained models TIDIGITS.Adding lowered error rate by around 10% relative. overallcombination VTLN, slope space scaling represents asubstantial 31% relative reduction compared with single VTLN. There was noimprovement among adult speakers TIDIGITS TIMIT. Improvement forthis category expected when training sets are recorded indifferent conditions, such as read spontaneous speech.

参考文章(7)
Sadaoki Furui, Koji Iwano, Masanobu Nakamura, Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances. conference of the international speech communication association. pp. 3381- 3384 ,(2005)
Daniel Elenius, Mats Blomberg, Dynamic vocal tract length normalization in speech recognition Fonetik 2010, Lund, June 2-4, 2010. pp. 29- 34 ,(2010)
Daniel Elenius, Mats Blomberg, Investigating Explicit Model Transformations for Speaker Normalization ISCA ITRW Speech Analysis and Processing for Knowledge Discovery. ,(2008)
Daniel Elenius, Mats Blomberg, Tree-Based Estimation of Speaker Characteristics for Speech Recognition conference of the international speech communication association. pp. 580- 583 ,(2009)
Sungbok Lee, Alexandros Potamianos, Shrikanth Narayanan, Acoustics of children's speech: developmental changes of temporal and spectral parameters. Journal of the Acoustical Society of America. ,vol. 105, pp. 1455- 1468 ,(1999) , 10.1121/1.426686
B. Lindblom, Spectrographic Study of Vowel Reduction The Journal of the Acoustical Society of America. ,vol. 35, pp. 1773- 1781 ,(1963) , 10.1121/1.1918816
Li Lee, R.C. Rose, Speaker normalization using efficient frequency warping procedures international conference on acoustics speech and signal processing. ,vol. 1, pp. 353- 356 ,(1996) , 10.1109/ICASSP.1996.541105