作者: Mustafa N Kaynak , Qi Zhi , Adrian David Cheok , Kuntal Sengupta , Zhang Jian
DOI: 10.1016/J.SPECOM.2004.01.003
关键词:
摘要: Abstract Bimodal speech recognition is a novel extension of acoustic for which both and visual information are used to improve the accuracy in noisy environments. Although various bimodal systems have been developed, rigorous detailed comparison possible geometric features from speakers' faces has not given yet previous papers. Thus, this paper, compared analyzed rigorously their importance recognition. The relevant each single feature determine best combination visual-only From analyzed, lip vertical aperture most relevant; set formed by horizontal apertures first order derivative corner angle gives results among possibilities reduced that were analyzed. Also, effect modelling parameters hidden Markov models (HMM) on feature's Finally, acoustic-only, visual-only, methods experimentally determined using optimized HMMs features. Compared recognition, scheme much improved features, especially presence noise. obtained showed as few three labial sufficient rate 20% (from 62%, with acoustic-only information, 82%, audio-visual at signal noise ratio (SNR) 0 dB).