Audiovisual speech recognition using multiscale nonlinear image decomposition

作者: I. Matthews , J.A. Bangham , S. Cox

DOI: 10.1109/ICSLP.1996.607019

关键词:

摘要: There has recently been increasing interest in the idea of enhancing speech recognition by use visual information derived from face talker. This paper demonstrates nonlinear image decomposition, form a "sieve", applied to task recognition. Information mouth region is used and audio-visual database letters A-Z for four talkers. A scale histogram generated directly gray-scale pixels window containing talker's on per-frame basis. Results are presented visual-only, audio-only simple case.

参考文章(14)
Jordi Robert-Ribes, Michel Piquemal, Jean-Luc Schwartz, Pierre Escudier, Exploiting sensor fusion architectures and stimuli complementarity in AV speech recognition Springer, Berlin, Heidelberg. pp. 193- 210 ,(1996) , 10.1007/978-3-662-13015-5_14
Peter L. Silsbee, Alan C. Bovik, Medium Vocabulary Audiovisual Speech Recognition Springer Berlin Heidelberg. pp. 120- 123 ,(1995) , 10.1007/978-3-642-57745-1_21
Peter L. Silsbee, Qin Su, Audiovisual Sensory Integration Using Hidden Markov Models Springer Berlin Heidelberg. pp. 489- 496 ,(1996) , 10.1007/978-3-662-13015-5_37
Kenji Mase, Alex Pentland, Automatic lipreading by optical-flow analysis Systems and Computers in Japan. ,vol. 22, pp. 67- 76 ,(1991) , 10.1002/SCJ.4690220607
Robert Kaucic, Barney Dalton, Andrew Blake, Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications european conference on computer vision. pp. 376- 387 ,(1996) , 10.1007/3-540-61123-1_154
E. Petajan, B. Bischoff, D. Bodoff, N. M. Brooke, An improved automatic lipreading system to enhance speech recognition Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '88. pp. 19- 25 ,(1988) , 10.1145/57167.57170
M.J. Tomlinson, M.J. Russell, N.M. Brooke, Integrating audio and visual information to provide highly robust speech recognition international conference on acoustics speech and signal processing. ,vol. 2, pp. 821- 824 ,(1996) , 10.1109/ICASSP.1996.543247
J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden Markov models international conference on acoustics speech and signal processing. ,vol. 2, pp. 817- 820 ,(1996) , 10.1109/ICASSP.1996.543246
C. Bregler, S.M. Omohundro, Yochai Konig, A hybrid approach to bimodal speech recognition asilomar conference on signals, systems and computers. ,vol. 1, pp. 556- 560 ,(1994) , 10.1109/ACSSC.1994.471514
J.A. Bangham, P. Ling, R. Young, Multiscale recursive medians, scale-space, and transforms with applications to image processing IEEE Transactions on Image Processing. ,vol. 5, pp. 1043- 1048 ,(1996) , 10.1109/83.503918