Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer

作者: Marc Delcroix , Tomohiro Nakatani , Shinji Watanabe

DOI: 10.1109/ICASSP.2008.4518549

关键词:

摘要: It is well known that automatic speech recognition performs poorly in presence of noise or reverberation. Much research has been undertaken on model adaptation and enhancement to increase the robustness recognizers. Model effective remove static mismatch between features acoustic parameters, but may not cope with dynamic mismatch. Speech approaches can reduce perturbations, often do interconnect recognizer. There seems be a lack optimal way combine these two approaches. In this paper we propose introducing capabilities into scheme. We focus variance adaptation, novel parametric includes components. The component derived from pre-process, parameters are optimized using an adaptive training An evaluation method dereverberation for preprocessing revealed 80 % relative error rate reduction was possible compared dereverberated speech, final 5.4 which close clean (1.2%).

参考文章(8)
T. Hori, NTT Speech recognizer with OutLook On the Next generation : SOLON Proc. NTT Workshop on Communication Scene Analysis, 2004. ,(2004)
Alex Acero, Mike Plumpe, Li Deng, Xuedong Huang, Large-vocabulary speech recognition under adverse acoustic environments. conference of the international speech communication association. pp. 806- 809 ,(2000)
M.J.F. Gales, P.C. Woodland, Mean and variance adaptation within the MLLR framework Computer Speech & Language. ,vol. 10, pp. 249- 264 ,(1996) , 10.1006/CSLA.1996.0013
M.J.F. Gales, S.J. Young, Robust continuous speech recognition using parallel model combination IEEE Transactions on Speech and Audio Processing. ,vol. 4, pp. 352- 359 ,(1996) , 10.1109/89.536929
A. Sankar, Chin-Hui Lee, A maximum-likelihood approach to stochastic matching for robust speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 4, pp. 190- 202 ,(1996) , 10.1109/89.496215
R.C. Rose, E.M. Hofstetter, D.A. Reynolds, Integrated models of signal and background with application to speaker identification in noise IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 245- 257 ,(1994) , 10.1109/89.279273
Li Deng, J. Droppo, A. Acero, Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 412- 421 ,(2005) , 10.1109/TSA.2005.845814
Dorothea Kolossa, Hiroshi Sawada, Ramon Fernandez Astudillo, Reinhold Orglmeister, Shoji Makino, Recognition of Convolutive Speech Mixtures by Missing Feature Techniques for ICA asilomar conference on signals, systems and computers. pp. 1397- 1401 ,(2006) , 10.1109/ACSSC.2006.354987