Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR

作者: Astrid Hagen , Andrew Morris

DOI: 10.1016/J.CSL.2003.12.002

关键词:

摘要: Abstract In this article we review several successful extensions to the standard hidden-Markov-model/artificial neural network (HMM/ANN) hybrid, which have recently made important contributions field of noise robust automatic speech recognition. The first extension hybrid was “multi-band hybrid”, in a separate ANN is trained on each frequency sub-band, followed by some form weighted combination state posterior probability outputs prior decoding. However, due inaccurate assumption sub-band independence, system usually gives degraded performance, except case narrow-band noise. All systems overcome independence and give improved performance noise, while also improving or not significantly degrading with clean speech. “all-combinations multi-band” trains for combination. This, however, typically requires large number ANNs. multi-stream” an expert every just small complementary data streams. Multiple posteriors using maximum a-posteriori (MAP) weighting rise further strategy hypothesis level MAP selection. An alternative exploiting classification capacity ANNs “tandem hybrid” approach one more classifiers are multi-condition generate discriminative features input ASR system. “multi-stream tandem feature streams, permitting multi-stream fusion. “narrow-band particularly narrow sub-bands. This robustness noises seen during training. Of presented, all provide generic models multi-modal Test results presented discussed.

参考文章(78)
Ljubomir Josifovski, Martin Cooke, Phil D. Green, Jon Barker, Soft decisions in missing data techniques for robust automatic speech recognition. conference of the international speech communication association. pp. 373- 376 ,(2000)
Tammo Houtgast, Jan A. Verhave, A physical approach to speech quality assessment: correlation patterns in the speech spectrogram. conference of the international speech communication association. ,(1991)
Christophe Ris, Vincent Fontaine, Jean-Marc Boite, Nonlinear discriminant analysis for improved speech recognition. conference of the international speech communication association. ,(1997)
G Evermann, PC Woodland, Posterior probability decoding, confidence estimation and system combination NIST: National Institute of Standards and Technology. ,(2000)
Mike Noel, Terri Lander, Ronald A. Cole, T. Durham, New telephone speech corpora at CSLU. conference of the international speech communication association. ,(1995)
Nikki Mirghafori, Nelson Morgan, Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. conference of the international speech communication association. ,(1998)
Peter Jancovic, Ji Ming, A multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition. conference of the international speech communication association. pp. 579- 582 ,(2001)
M.L. Shire, Multi-stream ASR trained with heterogeneous reverberant environments international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 253- 256 ,(2001) , 10.1109/ICASSP.2001.940815