作者: Pedro J. Moreno
DOI:
关键词:
摘要: The accuracy of speech recognition systems degrades severely when the are operated in adverse acoustical environments. In recent years many approaches have been developed to address problem robust recognition, using feature-normalization algorithms, microphone arrays, representations based on human hearing, and other approaches. Nevertheless, date improvement afforded by such algorithms has limited, part because inadequacies mathematical models used characterize degradation. This thesis begins with a study reasons why degrade noise, Monte Carlo simulation techniques. From observations about these simulations we propose simple yet effective model how environment affects parameters their input. The proposed degradation is applied two different environmental compensation, data-driven methods model-based methods. Data-driven learn noisy characteristics incoming from direct comparisons recorded same under optimal conditions. Model-based use attempt samples degraded estimate model. In this argue that careful formulation improves for both compensation procedures. representation develop can be feature vectors stored statistical systems. These referred as RATZ STAR, respectively. Finally, introduce new approach solution vector Taylor series, VTS algorithms. The evaluated series experiments measuring ARPA Wall Street Journal database corrupted additive noise artificially injected at various signal-to-noise ratios (SNRs). For any particular SNR, upper bound provided practical system trained data SNR. RATZ, VTS, STAR achieve global SNRs low 15, 10, 5 dB, experimental results also demonstrate error rate obtained significantly better than what could achieved previous state art. We include small number indicate improvements our extend natural environments well. We generic its via series. show combination Maximum Likelihood produces dramatic accuracy.