Fusion of multiple uncertainty estimators and propagators for noise robust ASR

作者: Dung T. Tran , Emmanuel Vincent , Denis Jouvet

DOI: 10.1109/ICASSP.2014.6854657

关键词:

摘要: Uncertainty decoding has been successfully used for speech recognition in highly nonstationary noise environments. Yet, accurate estimation of the uncertainty on denoised signals and propagation to features remain difficult. In this work, we propose fuse estimates obtained from different estimators propagators by linear combination. The fusion coefficients are optimized minimizing a measure divergence with oracle development data. Using Kullback-Leibler divergence, obtain 18% relative error rate reduction 2nd CHiME Challenge respect conventional decoding, that is about twice as much achieved best single estimator propagator.

参考文章(18)
Alex Acero, Mike Plumpe, Li Deng, Xuedong Huang, Large-vocabulary speech recognition under adverse acoustic environments. conference of the international speech communication association. pp. 806- 809 ,(2000)
Li Deng, Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition Robust Speech Recognition of Uncertain or Missing Data. pp. 67- 99 ,(2011) , 10.1007/978-3-642-21317-5_4
Daniel D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization Nature. ,vol. 401, pp. 788- 791 ,(1999) , 10.1038/44565
Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, Atsushi Nakamura, Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds Computer Speech & Language. ,vol. 27, pp. 851- 873 ,(2013) , 10.1016/J.CSL.2012.07.006
Alexey Ozerov, Emmanuel Vincent, Frédéric Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 1118- 1133 ,(2012) , 10.1109/TASL.2011.2172425
Alexey Ozerov, Mathieu Lagrange, Emmanuel Vincent, Uncertainty-based learning of acoustic models from noisy data Computer Speech & Language. ,vol. 27, pp. 874- 894 ,(2013) , 10.1016/J.CSL.2012.07.002
Dung T. Tran, Emmanuel Vincent, Denis Jouvet, Extension of uncertainty propagation to dynamic MFCCS for noise robust ASR international conference on acoustics, speech, and signal processing. pp. 5507- 5511 ,(2014) , 10.1109/ICASSP.2014.6854656
Raul Kompass, A Generalized Divergence Measure for Nonnegative Matrix Factorization Neural Computation. ,vol. 19, pp. 780- 791 ,(2007) , 10.1162/NECO.2007.19.3.780
Dorothea Kolossa, Ramon Fernandez Astudillo, Eugen Hoffmann, Reinhold Orglmeister, Independent component analysis and time-frequency masking for speech recognition in multitalker conditions Eurasip Journal on Audio, Speech, and Music Processing. ,vol. 2010, pp. 651420- ,(2010) , 10.1155/2010/651420
H. Liao, M. J. F Gales, Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 389- 392 ,(2007) , 10.1109/ICASSP.2007.366931