Speechreading using Probabilistic Models

作者: Juergen Luettin , Neil A. Thacker

DOI: 10.1006/CVIU.1996.0570

关键词: Image (mathematics)Speech recognitionComputer scienceSpeechreadingVisibility (geometry)SpecularityHidden Markov modelGaussianProbabilistic logicTracking (particle physics)

摘要: We describe a robust method for locating and tracking lips in gray-level image sequences. Our approach learns patterns of shape variability from training set which constrains the model during search to only deform ways similar examples. Image is guided by learned used large appearance lips. Such might be due different individuals, illumination, mouth opening, specularity, or visibility teeth tongue. Visual speech features are recovered results represent both intensity information. speechreading (lip-reading) system, where extracted modeled Gaussian distributions their temporal dependencies hidden Markov models. Experimental presented lips, speechreading. The database consists broad variety speakers was recorded natural environment with no special lighting lip markers used. For speaker independent digit recognition task using visual information only, system achieved an accuracy about equivalent that untrained humans.

参考文章(61)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
Quentin Summerfield, Audio-visual Speech Perception, Lipreading and Artificial Stimulation Hearing Science and Hearing Disorders. pp. 131- 182 ,(1983) , 10.1016/B978-0-12-460440-7.50010-7
Joseph S. Perkell, Physiology of Speech Production Phonosurgery. pp. 5- 21 ,(1989) , 10.1007/978-4-431-68358-2_2
Alex Waibel, Paul Duchnowski, Uwe Meier, See me, hear me: integrating automatic speech recognition and lip-reading. conference of the international speech communication association. ,(1994)
M. E. Lutman, M. P. Haggard, Hearing science and hearing disorders Academic Press. ,(1983)
C. Benoît, T. Guiard-Marigny, B. Le Goff, A. Adjoudani, Which components of the face do humans and machines best speechread Springer Berlin Heidelberg. pp. 315- 328 ,(1996) , 10.1007/978-3-662-13015-5_24
Chung-Lin Huang, Ching-Wen Chen, Human facial feature extraction for face interpretation and recognition international conference on pattern recognition. ,vol. 25, pp. 1435- 1444 ,(1992) , 10.1016/0031-3203(92)90118-3
Steve W. Beet, Neil A. Thacker, Juergen Luettin, Statistical LIP modelling for visual speech recognition european signal processing conference. pp. 1- 4 ,(1996) , 10.5281/ZENODO.36365
Kenji Mase, Alex Pentland, Automatic lipreading by optical-flow analysis Systems and Computers in Japan. ,vol. 22, pp. 67- 76 ,(1991) , 10.1002/SCJ.4690220607
Tarcisio Coianiz, Lorenzo Torresani, Bruno Caprile, 2D Deformable Models for Visual Speech Analysis Springer, Berlin, Heidelberg. pp. 391- 398 ,(1996) , 10.1007/978-3-662-13015-5_29