Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications

作者: Robert Kaucic , Barney Dalton , Andrew Blake

DOI: 10.1007/3-540-61123-1_154

关键词:

摘要: Developments in dynamic contour tracking permit sparse representation of the outlines moving contours. Given increasing computing power general-purpose workstations it is now possible to track human faces and parts real-time without special hardware. This paper describes a lip tracker that uses Kalman filter based outline lips. Two alternative trackers, one tracks lips from profile view other frontal view, were developed extract visual speech recognition features contour. In both cases, have been incorporated into an acoustic automatic recogniser. Tests on small isolated-word vocabularies using time warping audio-visual recogniser demonstrate real-time, contour-based can be used supplement acoustic-only recognisers enabling robust presence noise.

参考文章(24)
David Goodine, Victor Zue, Joseph Polifroni, Michael S. Phillips, James R. Glass, Hong C. Leung, Stephanie Seneff, Lynette Hirschman, From Speech Recognition to Spoken Language Understanding. neural information processing systems. pp. 255- 261 ,(1990)
Barney Dalton, Robert Kaucic, Andrew Blake, Automatic Speechreading using dynamic contours Springer Berlin Heidelberg. pp. 373- 382 ,(1996) , 10.1007/978-3-662-13015-5_27
A. Adjoudani, C. Benoît, On the Integration of Auditory and Visual Parameters in an HMM-based ASR Springer, Berlin, Heidelberg. pp. 461- 471 ,(1996) , 10.1007/978-3-662-13015-5_35
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
David Reynard, Andrew Wildenberg, Andrew Blake, John Marchant, Learning Dynamics of Complex Motions from Image Sequences european conference on computer vision. pp. 357- 368 ,(1996) , 10.1007/BFB0015550
A.P. Varga, R.K. Moore, Hidden Markov model decomposition of speech and noise international conference on acoustics, speech, and signal processing. pp. 845- 848 ,(1990) , 10.1109/ICASSP.1990.115970
D.G. Stork, G. Wolff, E. Levine, Neural network lipreading system for improved speech recognition international joint conference on neural network. ,vol. 2, pp. 289- 295 ,(1992) , 10.1109/IJCNN.1992.226994
Andrew Blake, Rupert Curwen, Andrew Zisserman, A framework for spatiotemporal control in the tracking of visual contours International Journal of Computer Vision. ,vol. 11, pp. 127- 145 ,(1993) , 10.1007/BF01469225
E. Petajan, B. Bischoff, D. Bodoff, N. M. Brooke, An improved automatic lipreading system to enhance speech recognition Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '88. pp. 19- 25 ,(1988) , 10.1145/57167.57170
M.W. Mak, W.G. Allen, Lip-motion analysis for speech segmentation in noise Speech Communication. ,vol. 14, pp. 279- 296 ,(1994) , 10.1016/0167-6393(94)90067-1