Audiovisual speech recognition using multiscale nonlinear image decomposition

DOI: 10.1109/ICSLP.1996.607019

关键词:

摘要: There has recently been increasing interest in the idea of enhancing speech recognition by use visual information derived from face talker. This paper demonstrates nonlinear image decomposition, form a "sieve", applied to task recognition. Information mouth region is used and audio-visual database letters A-Z for four talkers. A scale histogram generated directly gray-scale pixels window containing talker's on per-frame basis. Results are presented visual-only, audio-only simple case.

参考文章(14)

Jordi Robert-Ribes, Michel Piquemal, Jean-Luc Schwartz, Pierre Escudier, Exploiting sensor fusion architectures and stimuli complementarity in AV speech recognition Springer, Berlin, Heidelberg. pp. 193- 210 ,(1996) , 10.1007/978-3-662-13015-5_14

Peter L. Silsbee, Alan C. Bovik, Medium Vocabulary Audiovisual Speech Recognition Springer Berlin Heidelberg. pp. 120- 123 ,(1995) , 10.1007/978-3-642-57745-1_21

Peter L. Silsbee, Qin Su, Audiovisual Sensory Integration Using Hidden Markov Models Springer Berlin Heidelberg. pp. 489- 496 ,(1996) , 10.1007/978-3-662-13015-5_37

Kenji Mase, Alex Pentland, Automatic lipreading by optical-flow analysis Systems and Computers in Japan. ,vol. 22, pp. 67- 76 ,(1991) , 10.1002/SCJ.4690220607

Robert Kaucic, Barney Dalton, Andrew Blake, Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications european conference on computer vision. pp. 376- 387 ,(1996) , 10.1007/3-540-61123-1_154

E. Petajan, B. Bischoff, D. Bodoff, N. M. Brooke, An improved automatic lipreading system to enhance speech recognition Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '88. pp. 19- 25 ,(1988) , 10.1145/57167.57170

M.J. Tomlinson, M.J. Russell, N.M. Brooke, Integrating audio and visual information to provide highly robust speech recognition international conference on acoustics speech and signal processing. ,vol. 2, pp. 821- 824 ,(1996) , 10.1109/ICASSP.1996.543247

J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden Markov models international conference on acoustics speech and signal processing. ,vol. 2, pp. 817- 820 ,(1996) , 10.1109/ICASSP.1996.543246

C. Bregler, S.M. Omohundro, Yochai Konig, A hybrid approach to bimodal speech recognition asilomar conference on signals, systems and computers. ,vol. 1, pp. 556- 560 ,(1994) , 10.1109/ACSSC.1994.471514

10.

J.A. Bangham, P. Ling, R. Young, Multiscale recursive medians, scale-space, and transforms with applications to image processing IEEE Transactions on Image Processing. ,vol. 5, pp. 1043- 1048 ,(1996) , 10.1109/83.503918

Audiovisual speech recognition using multiscale nonlinear image decomposition

来源期刊

我的账户

Audiovisual speech recognition using multiscale nonlinear image decomposition

来源期刊

相似文章 10

我的账户