作者: I. Matthews , J.A. Bangham , S. Cox
DOI: 10.1109/ICSLP.1996.607019
关键词:
摘要: There has recently been increasing interest in the idea of enhancing speech recognition by use visual information derived from face talker. This paper demonstrates nonlinear image decomposition, form a "sieve", applied to task recognition. Information mouth region is used and audio-visual database letters A-Z for four talkers. A scale histogram generated directly gray-scale pixels window containing talker's on per-frame basis. Results are presented visual-only, audio-only simple case.