Acoustical and environmental robustness in automatic speech recognition

作者: Alejandro Acero

DOI: 10.1007/978-1-4615-3122-7

关键词:

摘要: This dissertation describes a number of algorithms developed to increase the robustness automatic speech recognition systems with respect changes in environment. These attempt improve accuracy when they are trained and tested different acoustical environments, desk-top microphone (rather than close-talking microphone) is used for input. Without such processing, mismatches between training testing conditions produce an unacceptable degradation accuracy. Two kinds environmental variability introduced by use microphones conditions: additive noise spectral tilt linear filtering. An important attribute novel compensation described this thesis that provide joint rather independent these two types degradation. Acoustical applied our as correction cepstral domain. allows higher degree integration within SPHINX, Carnegie Mellon system, uses cepstrum its feature vector. Therefore, can be implemented very efficiently. Processing many based on instantaneous signal-to-noise ratio (SNR), appropriate represents form suppression at low SNRs equalization high SNRs. The vectors transformations estimated minimizing differences obtained from "standard" corpus represent current In work accomplished distortion vector-quantized cepstra produced extraction module SPHINX. In we describe several including SNR-Dependent Cepstral Normalization, (SDCN) Codeword-Dependent Normalization (CDCN). With CDCN, SPHINX recorded essentially same system microphone. An algorithm frequency normalization has also been proposed which parameter bilinear transformation signal-processing stage warping adjusted each new speaker The optimum value again chosen minimize vector-quantization standard environment one. preliminary studies, moderate additional decrease observed error rate.

参考文章(78)
Stephanie Seneff, A joint synchrony/mean-rate model of auditory speech processing Journal of Phonetics. ,vol. 16, pp. 101- 111 ,(1990) , 10.1016/S0095-4470(19)30466-8
K.-F. Lee, H.-W. Hon, Large-vocabulary speaker-independent continuous speech recognition using HMM international conference on acoustics speech and signal processing. pp. 123- 126 ,(1988) , 10.1109/ICASSP.1988.196527
H. Hermansky, J.C. Junqua, Optimization of perceptually-based ASR front-end (automatic speech recognition) international conference on acoustics speech and signal processing. pp. 219- 222 ,(1988) , 10.1109/ICASSP.1988.196553
D. Paul, A speaker-stress resistant HMM isolated word recognizer international conference on acoustics, speech, and signal processing. ,vol. 12, pp. 713- 716 ,(1987) , 10.1109/ICASSP.1987.1169551
M.J. Hunt, C. Lefebvre, Speaker dependent and independent speech recognition experiments with an auditory model international conference on acoustics speech and signal processing. pp. 215- 218 ,(1988) , 10.1109/ICASSP.1988.196552
S. Furui, Unsupervised speaker adaptation method based on hierarchical spectral clustering International Conference on Acoustics, Speech, and Signal Processing. pp. 286- 289 ,(1989) , 10.1109/ICASSP.1989.266421
A. Nadas, D. Nahamoo, M.A. Picheny, Adaptive labeling: normalization of speech by adaptive transformations based on vector quantization international conference on acoustics speech and signal processing. pp. 521- 524 ,(1988) , 10.1109/ICASSP.1988.196634
J. Porter, S. Boll, Optimal estimators for spectral restoration of noisy speech international conference on acoustics, speech, and signal processing. ,vol. 9, pp. 53- 56 ,(1984) , 10.1109/ICASSP.1984.1172545
A.P. Varga, R.K. Moore, Hidden Markov model decomposition of speech and noise international conference on acoustics, speech, and signal processing. pp. 845- 848 ,(1990) , 10.1109/ICASSP.1990.115970