Log-Domain Speech Feature Enhancement Using Sequential MAP Noise Estimation and a Phase-sensitive Model of the Acoustic Environment

作者: Alex Acero , Li Deng , Jasha Droppo

DOI:

关键词:

摘要: In this paper we present an MMSE (minimum mean square error) speech feature enhancement algorithm, capitalizing on a new probabilistic, nonlinear environment model that effectively incorporates the phase relationship between clean and corrupting noise in acoustic distortion. The estimator based phase-sensitive is derived it achieves high efficiency by exploiting single-point Taylor series expansion to approximate joint probability of noisy as multivariate Gaussian. As integral component also sequential MAP-based nonstationary estimator. Experimental results Aurora2 task demonstrate importance corruption process captured reported performs significantly better than phase-insensitive spectral subtraction (54% error rate reduction), noticeably our previous state-of-the-art technique [2] (7% under otherwise identical experimental conditions recognition.

参考文章(6)
Alex Acero, Mike Plumpe, Li Deng, Xuedong Huang, Large-vocabulary speech recognition under adverse acoustic environments. conference of the international speech communication association. pp. 806- 809 ,(2000)
Trausti T. Kristjansson, Brendan J. Frey, Alex Acero, Li Deng, ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition. conference of the international speech communication association. pp. 901- 904 ,(2001)
Alex Acero, Li Deng, Jasha Droppo, Evaluation of the SPLICE algorithm on the Aurora2 database. conference of the international speech communication association. pp. 217- 220 ,(2001)
Li Deng, A. Acero, Li Jiang, J. Droppo, Xuedong Huang, High-performance robust speech recognition using stereo training data international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 301- 304 ,(2001) , 10.1109/ICASSP.2001.940827
Li Deng, Jasha Droppo, Alex Acero, A Bayesian approach to speech feature enhancement using the dynamic cepstral prior IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 1, pp. 829- 832 ,(2002) , 10.1109/ICASSP.2002.5743867
P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition international conference on acoustics speech and signal processing. ,vol. 2, pp. 733- 736 ,(1996) , 10.1109/ICASSP.1996.543225