作者: Yuuki Tachioka , Shinji Watanabe
DOI:
关键词:
摘要: Speech enhancement is an important front-end technique to improve automatic speech recognition (ASR) in noisy environments. However, the wrong noise suppression of often causes additional distortions signals, which degrades ASR performance. To compensate distortions, needs consider uncertainty enhanced features, can be achieved by using expectation decoding/training process with respect probabilistic representation input features. unlike Gaussian mixture model, it difficult for Deep Neural Network (DNN) deal this analytically due nonlinear activations. This paper proposes efficient Monte-Carlo approximation methods calculation realize DNN based decoding and training. It first models features linear interpolation between original feature vectors a random coefficient. By sampling on stochastic training, learn generalize variations Our method also samples decoding, integrates multiple hypotheses obtained from samples. Experiments reverberated tasks (the second CHiME REVERB challenges) show effectiveness our techniques.