Lip feature extraction and reduction for HMM-based visual speech recognition systems

作者: S. Alizadeh , R. Boostani , V. Asadpour

DOI: 10.1109/ICOSP.2008.4697195

关键词:

摘要: Lipreading is a main part of audio-visual speech recognition systems which are mostly faced with redundancy extracted features. In this paper, new approach has been proposed to increase the lipreading performance by extraction discriminant way, first, faces detected; then, lip key points in four cubic curves characterize contours. Next, visual features from contours for each frame. To discriminate unit (word) others, that frames arranged feature vector. Moreover, differences frame k previous used construct more informative vectors. solve small sample size problem, direct linear analysis (D-LDA) employed reduce size. classify these transformed features, hidden Markov model (HMM) recognize units. The algorithm was applied on M2VTS database. Results show applying D-LDA reduction provides better classification accuracy compare employ HMM without reduction.

参考文章(12)
J. Andrew Bangham, Richard Harvey, Iain Matthews, Stephen Cox, Nonlinear scale decomposition based features for visual speech recognition european signal processing conference. pp. 1- 4 ,(1998) , 10.5281/ZENODO.36896
Lionel Revéret, Christian Benoît, A Viseme-based Approach to Labiometrics for Automatic Lipreading AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication. pp. 335- 342 ,(1997) , 10.1007/BFB0016013
Stéphane Pigeon, Luc Vandendorpe, The M2VTS Multimodal Face Database (Release 1.00) Lecture Notes in Computer Science. ,vol. 1206, pp. 403- 409 ,(1997) , 10.1007/BFB0016021
N. Eveno, A. Caplier, P.-Y. Coulon, New color transformation for lips segmentation multimedia signal processing. pp. 3- 8 ,(2001) , 10.1109/MMSP.2001.962702
Juergen Luettin, Neil A. Thacker, Speechreading using Probabilistic Models Computer Vision and Image Understanding. ,vol. 65, pp. 163- 178 ,(1997) , 10.1006/CVIU.1996.0570
N. Eveno, A. Caplier, P.-Y. Coulon, Accurate and quasi-automatic lip tracking IEEE Transactions on Circuits and Systems for Video Technology. ,vol. 14, pp. 706- 715 ,(2004) , 10.1109/TCSVT.2004.826754
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE. ,vol. 77, pp. 267- 296 ,(1989) , 10.1109/5.18626
P.L. Silsbee, A.C. Bovik, Computer lipreading for improved accuracy in automatic speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 4, pp. 337- 351 ,(1996) , 10.1109/89.536928