Extension of uncertainty propagation to dynamic MFCCS for noise robust ASR

作者: Dung T. Tran , Emmanuel Vincent , Denis Jouvet

DOI: 10.1109/ICASSP.2014.6854656

关键词:

摘要: Uncertainty propagation has been successfully employed for speech recognition in nonstationary noise environments. The uncertainty about the features is typically represented as a diagonal covariance matrix static only. We present framework estimating over both and dynamic full matrix. estimated then multiplied by scaling coefficients optimized on development data. achieve 21% relative error rate reduction 2nd CHiME Challenge with respect to conventional decoding without uncertainty, that five times more than achieved

参考文章(20)
Reinhold Haeb-Umbach, Dorothea Kolossa, Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications Springer Publishing Company, Incorporated. ,(2011)
Alex Acero, Mike Plumpe, Li Deng, Xuedong Huang, Large-vocabulary speech recognition under adverse acoustic environments. conference of the international speech communication association. pp. 806- 809 ,(2000)
John McDonough, Matthias Woelfel, Distant Speech Recognition ,(2009)
Li Deng, Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition Robust Speech Recognition of Uncertain or Missing Data. pp. 67- 99 ,(2011) , 10.1007/978-3-642-21317-5_4
Alexey Ozerov, Emmanuel Vincent, Frédéric Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 1118- 1133 ,(2012) , 10.1109/TASL.2011.2172425
Alexey Ozerov, Mathieu Lagrange, Emmanuel Vincent, Uncertainty-based learning of acoustic models from noisy data Computer Speech & Language. ,vol. 27, pp. 874- 894 ,(2013) , 10.1016/J.CSL.2012.07.002
Raul Kompass, A Generalized Divergence Measure for Nonnegative Matrix Factorization Neural Computation. ,vol. 19, pp. 780- 791 ,(2007) , 10.1162/NECO.2007.19.3.780
Dorothea Kolossa, Ramon Fernandez Astudillo, Eugen Hoffmann, Reinhold Orglmeister, Independent component analysis and time-frequency masking for speech recognition in multitalker conditions Eurasip Journal on Audio, Speech, and Music Processing. ,vol. 2010, pp. 651420- ,(2010) , 10.1155/2010/651420
H. Liao, M. J. F Gales, Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 389- 392 ,(2007) , 10.1109/ICASSP.2007.366931
Martin Cooke, Phil Green, Ljubomir Josifovski, Ascension Vizinho, Robust automatic speech recognition with missing and unreliable acoustic data Speech Communication. ,vol. 34, pp. 267- 285 ,(2001) , 10.1016/S0167-6393(00)00034-0