Speech recognition in noisy environments

作者: Pedro J. Moreno

DOI:

关键词:

摘要: The accuracy of speech recognition systems degrades severely when the are operated in adverse acoustical environments. In recent years many approaches have been developed to address problem robust recognition, using feature-normalization algorithms, microphone arrays, representations based on human hearing, and other approaches. Nevertheless, date improvement afforded by such algorithms has limited, part because inadequacies mathematical models used characterize degradation. This thesis begins with a study reasons why degrade noise, Monte Carlo simulation techniques. From observations about these simulations we propose simple yet effective model how environment affects parameters their input. The proposed degradation is applied two different environmental compensation, data-driven methods model-based methods. Data-driven learn noisy characteristics incoming from direct comparisons recorded same under optimal conditions. Model-based use attempt samples degraded estimate model. In this argue that careful formulation improves for both compensation procedures. representation develop can be feature vectors stored statistical systems. These referred as RATZ STAR, respectively. Finally, introduce new approach solution vector Taylor series, VTS algorithms. The evaluated series experiments measuring ARPA Wall Street Journal database corrupted additive noise artificially injected at various signal-to-noise ratios (SNRs). For any particular SNR, upper bound provided practical system trained data SNR. RATZ, VTS, STAR achieve global SNRs low 15, 10, 5 dB, experimental results also demonstrate error rate obtained significantly better than what could achieved previous state art. We include small number indicate improvements our extend natural environments well. We generic its via series. show combination Maximum Likelihood produces dramatic accuracy.

参考文章(40)
Stephanie Seneff, A joint synchrony/mean-rate model of auditory speech processing Journal of Phonetics. ,vol. 16, pp. 101- 111 ,(1990) , 10.1016/S0095-4470(19)30466-8
M. J. F. Gales, Model-based techniques for noise robust speech recognition Ph. D Dissertation, University of Cambridge. ,(1995)
M. A. Akivis, Vladislav Viktorovich Golʹdberg, An Introduction to Linear Algebra and Tensors ,(2010)
Raj Reddy, Kai-Fu Lee, Large-vocabulary speaker-independent continuous speech recognition: the sphinx system Carnegie Mellon University. ,(1988)
James K. Baker, Stochastic modeling as a means of automatic speech recognition. Carnegie Mellon University. ,(1975)
K. Takagi, H. Hattori, T. Watanabe, Rapid environment adaptation for robust speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 149- 152 ,(1995) , 10.1109/ICASSP.1995.479386
R. Schwartz, Y. Chow, S. Roucos, M. Krasner, J. Makhoul, Improved hidden Markov modeling of phonemes for continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 9, pp. 21- 24 ,(1984) , 10.1109/ICASSP.1984.1172751
L. Gillick, S.J. Cox, Some statistical issues in the comparison of speech recognition algorithms international conference on acoustics, speech, and signal processing. pp. 532- 535 ,(1989) , 10.1109/ICASSP.1989.266481
Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-5, pp. 179- 190 ,(1983) , 10.1109/TPAMI.1983.4767370