ROI processing for visual features extraction in lip-reading

作者: Xiaoping Wang , Yufeng Hao , Degang Fu , Chunwei Yuan

DOI: 10.1109/ICNNSP.2008.4590335

关键词: Hidden Markov modelImage processingFeature extractionRegion of interestArtificial intelligenceDiscrete cosine transformEdge detectionComputer visionPattern recognitionEdge enhancementImage segmentationComputer science

摘要: Region of interest (ROI) is the key basis visual features extraction in lip-reading process. In this paper, we discussed ROI processing method and explored its impact on recognition accuracy with comparison four kinds processed ROIs obtained by using basic image methods: gray-scale normalization, difference enhancement, edge enhancement segmentation. Then tasks for speaker-independent were carried out aid continuous hidden Markov model (CHMM). The experimental results show that discrete cosine transform (DCT) based features, normalized can achieve best performance among these ROIs.

参考文章(17)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
K.D. Lee, M.J. Lee, Soo-Young Lee, Extraction of frame-difference features based on PCA and ICA for lip-reading international joint conference on neural network. ,vol. 1, pp. 232- 237 ,(2005) , 10.1109/IJCNN.2005.1555835
Rui Wang, Wen Gao, Jiyong Ma, An Approach to Robust and Fast Locating Lip Motion international conference on multimodal interfaces. pp. 332- 339 ,(2000) , 10.1007/3-540-40063-X_44
M.E. Hennecke, K.V. Prasad, D.G. Stork, Automatic speech recognition system using acoustic and visual signals asilomar conference on signals, systems and computers. ,vol. 2, pp. 1214- 1218 ,(1995) , 10.1109/ACSSC.1995.540892
D. Chandramohan, P.L. Silsbee, A multiple deformable template approach for visual speech recognition international conference on spoken language processing. ,vol. 1, pp. 50- 53 ,(1996) , 10.1109/ICSLP.1996.607022
G. Potamianos, C. Neti, Improved ROI and within frame discriminant features for lipreading international conference on image processing. ,vol. 3, pp. 250- 253 ,(2001) , 10.1109/ICIP.2001.958098
S.L. Wang, W.H. Lau, S.H. Leung, H. Yan, A real-time automatic lipreading system international symposium on circuits and systems. ,vol. 2, pp. 101- 104 ,(2004) , 10.1109/ISCAS.2004.1329218
P. Delmas, P.Y. Coulon, V. Fristot, Automatic snakes for robust lip boundaries extraction international conference on acoustics speech and signal processing. ,vol. 6, pp. 3069- 3072 ,(1999) , 10.1109/ICASSP.1999.757489
I. Matthews, G. Potamianos, C. Neti, J. Luettin, A comparison of model and transform-based visual features for audio-visual LVCSR international conference on multimedia and expo. pp. 825- 828 ,(2001) , 10.1109/ICME.2001.1237849
J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden Markov models international conference on acoustics speech and signal processing. ,vol. 2, pp. 817- 820 ,(1996) , 10.1109/ICASSP.1996.543246