Integration of DNN based speech enhancement and ASR.

作者: Ramón Fernandez Astudillo , Maria Joana Correia , Isabel Trancoso

DOI:

关键词:

摘要: Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) approaches. In the past, Observation Uncertainty approaches integrate MMSE speech with Automatic Recognition (ASR) have yielded good results lightweight for robust ASR. this paper we thus explore integration of DNN-based ASR by techniques. For purpose, various techniques and approximations that allow propagating uncertainty inference DNN into feature domain. This can then be used dynamically compensate model utilizing like decoding. We test proposed on AURORA4 corpus show notable improvements attained over already effective enhancement.

参考文章(17)
Ramon Fernandez Astudillo, Alberto Abad, Isabel Trancoso, Accounting for the residual uncertainty of multi-layer perceptron based features international conference on acoustics, speech, and signal processing. pp. 6859- 6863 ,(2014) , 10.1109/ICASSP.2014.6854929
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 33, pp. 443- 445 ,(1984) , 10.1109/TASSP.1985.1164550
Arun Narayanan, DeLiang Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition international conference on acoustics, speech, and signal processing. pp. 7092- 7096 ,(2013) , 10.1109/ICASSP.2013.6639038
I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging IEEE Transactions on Speech and Audio Processing. ,vol. 11, pp. 466- 475 ,(2003) , 10.1109/TSA.2003.811544
G. E. Dahl, Dong Yu, Li Deng, A. Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 30- 42 ,(2012) , 10.1109/TASL.2011.2134090
Andrew Y. Ng, Quoc V. Le, Andrew L. Maas, Oriol Vinyals, Patrick Nguyen, Tyler M. O'Neil, Recurrent Neural Networks for Noise Reduction in Robust ASR conference of the international speech communication association. pp. 22- 25 ,(2012)
João Paulo da Silva Neto, Ramón Fernandez Astudillo, Propagation of Uncertainty Through Multilayer Perceptrons for Robust Automatic Speech Recognition. conference of the international speech communication association. pp. 461- 464 ,(2011)
Tian Gao, Li-Rong Dai, Chin-Hui Lee, Qing Wang, Jun Du, Yong Xu, Robust speech recognition with speech enhanced deep neural networks. conference of the international speech communication association. pp. 616- 620 ,(2014)
G. Rigoll, M. Wöllmer, B. Schuller, J. Geiger, F. Weninger, The Munich Feature Enhancement Approach to the 2013 CHiME Challenge Using BLSTM Recurrent Neural Networks international conference on acoustics speech and signal processing. ,(2013)
D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. ,vol. 1, pp. 318- 362 ,(1986)