A hybrid approach to bimodal speech recognition

作者: C. Bregler , S.M. Omohundro , Yochai Konig

DOI: 10.1109/ACSSC.1994.471514

关键词:

摘要: We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. show that and information improves the performance significantly, especially in noisy environments. This is achieved a hybrid architecture, consisting of new learning tracking mechanism, channel robust front end, connectionist phone classifier, HMM based sentence classifier. Our focus this paper on subsystem "surface-learning" active vision models. bimodal system has already been applied to multi-speaker spelling task, work progress apply it speaker independent spontaneous "Berkeley Restaurant Project (BeRP)". >

参考文章(11)
Andreas Stolcke, Chuck Wooters, Eric Fosler, Gary N. Tajchman, Daniel Jurafsky, Jonathan Segal, Nelson Morgan, The berkeley restaurant project. conference of the international speech communication association. ,(1994)
E. Petajan, B. Bischoff, D. Bodoff, N. M. Brooke, An improved automatic lipreading system to enhance speech recognition Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '88. pp. 19- 25 ,(1988) , 10.1145/57167.57170
M Kirby, F Weisser, G Dangelmayr, A model problem in the representation of digital image sequences Pattern Recognition. ,vol. 26, pp. 63- 73 ,(1993) , 10.1016/0031-3203(93)90088-E
Dominic W. Massaro, Michael M. Cohen, Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance. ,vol. 9, pp. 753- 771 ,(1983) , 10.1037//0096-1523.9.5.753
Alan L. Yuille, Deformable templates for face recognition Journal of Cognitive Neuroscience. ,vol. 3, pp. 59- 70 ,(1991) , 10.1162/JOCN.1991.3.1.59
H. Hermansky, N. Morgan, A. Bayya, P. Kohn, RASTA-PLP speech analysis technique international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 121- 124 ,(1992) , 10.1109/ICASSP.1992.225957
Michael Kass, Andrew Witkin, Demetri Terzopoulos, Snakes : Active Contour Models International Journal of Computer Vision. ,vol. 1, pp. 321- 331 ,(1988) , 10.1007/BF00133570
C. Bregler, H. Hild, S. Manke, A. Waibel, Improving connected letter recognition by lipreading IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 1, pp. 557- 560 ,(1993) , 10.1109/ICASSP.1993.319179
Matthew Turk, Alex Pentland, Eigenfaces for recognition Journal of Cognitive Neuroscience. ,vol. 3, pp. 71- 86 ,(1991) , 10.1162/JOCN.1991.3.1.71
B.P. Yuhas, M.H. Goldstein, T.J. Sejnowski, Integration of acoustic and visual speech signals using neural networks IEEE Communications Magazine. ,vol. 27, pp. 65- 71 ,(1989) , 10.1109/35.41402