Signal adaptive spectral envelope estimation for robust speech recognition

作者: Matthias Wölfel

DOI: 10.1016/J.SPECOM.2009.02.006

关键词:

摘要: This paper describes a novel spectral envelope estimation technique which adapts to the characteristics of observed signal. is possible via introduction second bilinear transformation into warped minimum variance distortionless response (MVDR) estimation. As opposed first transformation, however, applied in time domain, must be frequency domain. extension enables resolution estimate steered lower or higher frequencies, while keeping overall and axis fixed. When embedded feature extraction process an automatic speech recognition system, it provides for emphasis features that are relevant robust classification, simultaneously suppressing irrelevant classification. The change may steered, each observation window, by normalized autocorrelation coefficient. To evaluate proposed adaptive technique, dubbed warped-twice MVDR, we use two objective functions: class separability word error rate. Our test set consists development evaluation data as provided NIST Rich Transcription 2005 Spring Meeting Recognition Evaluation. For both measures, consistent improvements several speaker-to-microphone distances. In average, over all distances, front-end reduces rate 4% relative compared widely used mel-frequency cepstral coefficients well perceptual linear prediction.

参考文章(37)
Yoshihisa Nakatoh, Makoto Nishizaki, Shinichi Yoshizawa, Maki Yamada, An adaptive MEL-LPC analysis for speech recognition. conference of the international speech communication association. ,(2004)
Yoshihisa Nakatoh, Hiroshi Matsumoto, Yoshinori Furuhata, An Efficient MEL-LPC Analysis Method for Speech Recognition conference of the international speech communication association. pp. 1051- 1054 ,(1998)
Matthias Wölfel, Alex Waibel, John W. McDonough, Minimum Variance Distortionless Response on a Warped Frequency Scale conference of the international speech communication association. ,(2003)
Alex Waibe11, Hartwig Steusloff, Rainer Stiefelhagen, None, CHIL - Computers in the Human Interaction Loop. Journal of Machine Vision and Applications. pp. 18- 18 ,(2005)
John S. Coleman, Alice Greenwood, Joseph P. Olive, Acoustics of American English Speech: A Dynamic Approach ,(2014)
S Haykin, Adaptive Filter Theory ,(1986)
Jonathan G. Fiscus, Nicolas Radde, John S. Garofolo, Audrey Le, Jerome Ajot, Christophe Laprun, The Rich Transcription 2005 Spring Meeting Recognition Evaluation Machine Learning for Multimodal Interaction. pp. 369- 389 ,(2006) , 10.1007/11677482_32
Matthias Wolfel, Warped-twice minimum variance distortionless response spectral estimation european signal processing conference. pp. 1- 4 ,(2006)
H. Matsumoto, M. Moroto, Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 117- 120 ,(2001) , 10.1109/ICASSP.2001.940781
Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing ,(1989)