Lip geometric features for human–computer interaction using bimodal speech recognition: comparison and analysis

作者: Mustafa N Kaynak , Qi Zhi , Adrian David Cheok , Kuntal Sengupta , Zhang Jian

DOI: 10.1016/J.SPECOM.2004.01.003

关键词:

摘要: Abstract Bimodal speech recognition is a novel extension of acoustic for which both and visual information are used to improve the accuracy in noisy environments. Although various bimodal systems have been developed, rigorous detailed comparison possible geometric features from speakers' faces has not given yet previous papers. Thus, this paper, compared analyzed rigorously their importance recognition. The relevant each single feature determine best combination visual-only From analyzed, lip vertical aperture most relevant; set formed by horizontal apertures first order derivative corner angle gives results among possibilities reduced that were analyzed. Also, effect modelling parameters hidden Markov models (HMM) on feature's Finally, acoustic-only, visual-only, methods experimentally determined using optimized HMMs features. Compared recognition, scheme much improved features, especially presence noise. obtained showed as few three labial sufficient rate 20% (from 62%, with acoustic-only information, 82%, audio-visual at signal noise ratio (SNR) 0 dB).

参考文章(28)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
Eric Cosatto, Gerasimos Potamianos, Hans Peter Graf, David B. Roe, Speaker independent audio-visual database for bimodal ASR. Proc. AVSP'97. pp. 65- 68 ,(1997)
Lucio Prina Ricotti, Claudio Becchetti, Speech Recognition: Theory and C++ Implementation ,(1999)
A. Adjoudani, C. Benoît, On the Integration of Auditory and Visual Parameters in an HMM-based ASR Springer, Berlin, Heidelberg. pp. 461- 471 ,(1996) , 10.1007/978-3-662-13015-5_35
S. Gurbuz, Z. Tufekci, E. Patterson, J.N. Gowdy, Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 177- 180 ,(2001) , 10.1109/ICASSP.2001.940796
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
D.G. Stork, G. Wolff, E. Levine, Neural network lipreading system for improved speech recognition international joint conference on neural network. ,vol. 2, pp. 289- 295 ,(1992) , 10.1109/IJCNN.1992.226994
E. Petajan, B. Bischoff, D. Bodoff, N. M. Brooke, An improved automatic lipreading system to enhance speech recognition Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '88. pp. 19- 25 ,(1988) , 10.1145/57167.57170
HARRY MCGURK, JOHN MACDONALD, Hearing lips and seeing voices Nature. ,vol. 264, pp. 746- 748 ,(1976) , 10.1038/264746A0
Alexandrina Rogozan, Paul Deléglise, Adaptive fusion of acoustic and visual sources for automatic speech recognition Speech Communication. ,vol. 26, pp. 149- 161 ,(1998) , 10.1016/S0167-6393(98)00056-9