Dynamic Bayesian Networks for Audio-Visual Speech Recognition

作者: Ara V. Nefian , Luhong Liang , Xiaobo Pi , Xiaoxing Liu , Kevin Murphy

DOI: 10.1155/S1110865702206083

关键词: Computer scienceInvariant (mathematics)NoiseStatistical modelPattern recognitionHidden Markov modelDynamic Bayesian networkArtificial intelligenceSpeech recognitionWord recognitionAudio-visual speech recognition

摘要: The use of visual features in audio-visual speech recognition (AVSR) is justified by both the generation mechanism, which essentially bimodal audio and representation, need for that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements environments affected noise. In this paper, we describe two statistical models integration, coupled HMM (CHMM) factorial (FHMM), compare performance these with existing used speaker dependent isolated word recognition. properties CHMM FHMM allow model state asynchrony observation sequences while preserving their natural correlation over time. our experiments, performs best overall, outperforming all FHMM.

参考文章(29)
Vladimir Ivan Pavlovic, Thomas S. Huang, Dynamic bayesian networks for information fusion with applications to human-computer interfaces University of Illinois at Urbana-Champaign. ,(1999)
Thomas S. Huang, Stephen M. Chu, Bimodal speech recognition using coupled hidden Markov models conference of the international speech communication association. pp. 747- 750 ,(2000)
Chalapathy Neti, Guillaume Gravier, Gerasimos Potamianos, Asynchrony modeling for audio-visual speech recognition international conference on human language technology research. pp. 1- 6 ,(2002)
Ali Adjoudani, Christian Benoît, Audio-visual speech recognition compared across two architectures. conference of the international speech communication association. ,(1995)
Marcus E. Hennecke, David G. Stork, K. Venkatesh Prasad, Visionary Speech: Looking Ahead to Practical Speechreading Systems Springer Berlin Heidelberg. pp. 331- 349 ,(1996) , 10.1007/978-3-662-13015-5_25
Zoubin Ghahramani, Michael Jordan, None, Factorial Hidden Markov Models neural information processing systems. ,vol. 29, pp. 472- 478 ,(1995) , 10.1023/A:1007425814087
Z. Ghahramani, Learning dynamic bayesian networks Lecture Notes in Computer Science. pp. 168- 197 ,(1998)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
Uffe Kjærulff, A computational scheme for reasoning in dynamic probabilistic networks uncertainty in artificial intelligence. pp. 121- 129 ,(1992) , 10.1016/B978-1-4832-8287-9.50021-9
J. Luettin, G. Potamianos, C. Neti, Asynchronous stream modeling for large vocabulary audio-visual speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 169- 172 ,(2001) , 10.1109/ICASSP.2001.940794