Signal adaptive spectral envelope estimation for robust speech recognition

DOI: 10.1016/J.SPECOM.2009.02.006

关键词:

摘要: This paper describes a novel spectral envelope estimation technique which adapts to the characteristics of observed signal. is possible via introduction second bilinear transformation into warped minimum variance distortionless response (MVDR) estimation. As opposed first transformation, however, applied in time domain, must be frequency domain. extension enables resolution estimate steered lower or higher frequencies, while keeping overall and axis fixed. When embedded feature extraction process an automatic speech recognition system, it provides for emphasis features that are relevant robust classification, simultaneously suppressing irrelevant classification. The change may steered, each observation window, by normalized autocorrelation coefficient. To evaluate proposed adaptive technique, dubbed warped-twice MVDR, we use two objective functions: class separability word error rate. Our test set consists development evaluation data as provided NIST Rich Transcription 2005 Spring Meeting Recognition Evaluation. For both measures, consistent improvements several speaker-to-microphone distances. In average, over all distances, front-end reduces rate 4% relative compared widely used mel-frequency cepstral coefficients well perceptual linear prediction.

uni-trier.de 本地加速

sciencedirect.com 本地加速

参考文章(37)

Yoshihisa Nakatoh, Makoto Nishizaki, Shinichi Yoshizawa, Maki Yamada, An adaptive MEL-LPC analysis for speech recognition. conference of the international speech communication association. ,(2004)

Yoshihisa Nakatoh, Hiroshi Matsumoto, Yoshinori Furuhata, An Efficient MEL-LPC Analysis Method for Speech Recognition conference of the international speech communication association. pp. 1051- 1054 ,(1998)

Matthias Wölfel, Alex Waibel, John W. McDonough, Minimum Variance Distortionless Response on a Warped Frequency Scale conference of the international speech communication association. ,(2003)

Alex Waibe11, Hartwig Steusloff, Rainer Stiefelhagen, None, CHIL - Computers in the Human Interaction Loop. Journal of Machine Vision and Applications. pp. 18- 18 ,(2005)

John S. Coleman, Alice Greenwood, Joseph P. Olive, Acoustics of American English Speech: A Dynamic Approach ,(2014)

S Haykin, Adaptive Filter Theory ,(1986)

Jonathan G. Fiscus, Nicolas Radde, John S. Garofolo, Audrey Le, Jerome Ajot, Christophe Laprun, The Rich Transcription 2005 Spring Meeting Recognition Evaluation Machine Learning for Multimodal Interaction. pp. 369- 389 ,(2006) , 10.1007/11677482_32

Matthias Wolfel, Warped-twice minimum variance distortionless response spectral estimation european signal processing conference. pp. 1- 4 ,(2006)

H. Matsumoto, M. Moroto, Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 117- 120 ,(2001) , 10.1109/ICASSP.2001.940781

10.

Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing ,(1989)

Signal adaptive spectral envelope estimation for robust speech recognition

来源期刊

我的账户

Signal adaptive spectral envelope estimation for robust speech recognition

来源期刊

相似文章 7

From Signals to Speech Features by Digital Signal Processing.

The effect of wearing custom-made mouthguards on the aeroacoustic properties of Japanese sibilant /s/.

A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification

Hierarchical spectro-temporal features for robust speech recognition

L'UNIVERSITÉ BORDEAUX 1

Robust Speech Feature Extraction Using the Hilbert Transform Spectrum Estimation Method

Robust Automatic Transcription of Lectures

我的账户