ROI processing for visual features extraction in lip-reading

作者： Xiaoping Wang , Yufeng Hao , Degang Fu , Chunwei Yuan

DOI: 10.1109/ICNNSP.2008.4590335

关键词: Hidden Markov model 、 Image processing 、 Feature extraction 、 Region of interest 、 Artificial intelligence 、 Discrete cosine transform 、 Edge detection 、 Computer vision 、 Pattern recognition 、 Edge enhancement 、 Image segmentation 、 Computer science

摘要: Region of interest (ROI) is the key basis visual features extraction in lip-reading process. In this paper, we discussed ROI processing method and explored its impact on recognition accuracy with comparison four kinds processed ROIs obtained by using basic image methods: gray-scale normalization, difference enhancement, edge enhancement segmentation. Then tasks for speaker-independent were carried out aid continuous hidden Markov model (CHMM). The experimental results show that discrete cosine transform (DCT) based features, normalized can achieve best performance among these ROIs.

icm.edu.pl 本地加速

ieee.org 本地加速

sci-hub.se PDF 下载加速

参考文章(17)

Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)

K.D. Lee, M.J. Lee, Soo-Young Lee, Extraction of frame-difference features based on PCA and ICA for lip-reading international joint conference on neural network. ,vol. 1, pp. 232- 237 ,(2005) , 10.1109/IJCNN.2005.1555835

Rui Wang, Wen Gao, Jiyong Ma, An Approach to Robust and Fast Locating Lip Motion international conference on multimodal interfaces. pp. 332- 339 ,(2000) , 10.1007/3-540-40063-X_44

M.E. Hennecke, K.V. Prasad, D.G. Stork, Automatic speech recognition system using acoustic and visual signals asilomar conference on signals, systems and computers. ,vol. 2, pp. 1214- 1218 ,(1995) , 10.1109/ACSSC.1995.540892

D. Chandramohan, P.L. Silsbee, A multiple deformable template approach for visual speech recognition international conference on spoken language processing. ,vol. 1, pp. 50- 53 ,(1996) , 10.1109/ICSLP.1996.607022

G. Potamianos, C. Neti, Improved ROI and within frame discriminant features for lipreading international conference on image processing. ,vol. 3, pp. 250- 253 ,(2001) , 10.1109/ICIP.2001.958098

S.L. Wang, W.H. Lau, S.H. Leung, H. Yan, A real-time automatic lipreading system international symposium on circuits and systems. ,vol. 2, pp. 101- 104 ,(2004) , 10.1109/ISCAS.2004.1329218

P. Delmas, P.Y. Coulon, V. Fristot, Automatic snakes for robust lip boundaries extraction international conference on acoustics speech and signal processing. ,vol. 6, pp. 3069- 3072 ,(1999) , 10.1109/ICASSP.1999.757489

I. Matthews, G. Potamianos, C. Neti, J. Luettin, A comparison of model and transform-based visual features for audio-visual LVCSR international conference on multimedia and expo. pp. 825- 828 ,(2001) , 10.1109/ICME.2001.1237849

10.

J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden Markov models international conference on acoustics speech and signal processing. ,vol. 2, pp. 817- 820 ,(1996) , 10.1109/ICASSP.1996.543246

ROI processing for visual features extraction in lip-reading

来源期刊

我的账户

ROI processing for visual features extraction in lip-reading

来源期刊

相似文章 6

Use of Missing and Unreliable Data for Audiovisual Speech Recognition

Comparison of classifiers for lip reading with CUAVE and TULIPS database

Lip segmentation and tracking based on Chan-Vese model

A comparative study of English viseme recognition methods and algorithms

Lip segmentation using automatic selected initial contours based on localized active contour model

A Survey on Visual Speech Recognition Approaches

我的账户