Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

作者： Li Liu , Gang Feng , Denis Beautemps

关键词: Position (vector) 、 Artificial intelligence 、 Spline interpolation 、 Luminance 、 Vowel 、 Computer vision 、 Feature extraction 、 Computer science 、 Pixel 、 Viseme 、 Cued speech

摘要: In previous French Cued Speech (CS) studies, one of the widely used methods is painting blue color on speaker’s lips to make feature extraction easier. this paper, in order get rid artifice, a novel automatic method extract inner contour CS speakers presented. This based recent facial model developed computer vision, called Constrained Local Neural Field (CLNF), which provides eight characteristic landmarks describing contour. However, directly applied our data, CLNF fails about 41.4% cases. Therefore, we propose two correct B parameter (aperture lips) and A (width lips), respectively. For correcting parameter, hybrid dynamic correlation template (HD-CTM) using first derivative smoothed luminance variation proposed. HD-CTM detect outer lower position. Then, position obtained by subtracting validated thickness (VLLT). periodical spline interpolation with geometrical deformation six explored. Combined an round detector, efficient for (the third vowel viseme made vowels small opening). evaluated 4800 images three speakers. It corrects 95% errors total RMSE pixel (i.e., 0.05 cm average) achieved. The tested 927 images. error reduced significantly, comparable state art. Moreover, properly distributed plane after method.

springeropen.com 本地加速

archives-ouvertes.fr 本地加速

springer.com PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(19)

Lionel Reveret, Christian Benoit, A New 3D Lip Model for Analysis and Synthesis of Lip Motion in Speech Production ESCA Workshop on Audio-Visual Speech Processing, AVSP'98. pp. 207- 212 ,(1998)

Noureddine Aboutabit, Laurent Besacier, Olivier Mathieu, Denis Beautemps, Feature adaptation of hearing-impaired lip shapes: the vowel case in the Cued Speech context conference of the international speech communication association. pp. 2843- 2846 ,(2008)

Noureddine Aboutabit, Reconnaissance de la Langue Française Parlée Complété (LPC) : décodage phonétique des gestes main-lèvres. Institut National Polytechnique de Grenoble - INPG. ,(2007)

Samir K. Bandyopadhyay, LIP CONTOUR DETECTION TECHNIQUES BASED ON FRONT VIEW OF FACE Journal of Global Research in Computer Sciences. ,vol. 2, pp. 43- 46 ,(2011)

Panikos Heracleous, Denis Beautemps, Noureddine Aboutabit, Cued Speech automatic recognition in normal-hearing and deaf subjects Speech Communication. ,vol. 52, pp. 504- 512 ,(2010) , 10.1016/J.SPECOM.2010.03.001

D. Cristinacce, T. F. Cootes, Feature Detection and Tracking with Constrained Local Models british machine vision conference. ,vol. 3, pp. 929- 938 ,(2006) , 10.5244/C.20.95

Gung Feng, Data smoothing by cubic spline filters IEEE Transactions on Signal Processing. ,vol. 46, pp. 2790- 2796 ,(1998) , 10.1109/78.720380

Jason M. Saragih, Simon Lucey, Jeffrey F. Cohn, Deformable Model Fitting by Regularized Landmark Mean-Shift International Journal of Computer Vision. ,vol. 91, pp. 200- 215 ,(2011) , 10.1007/S11263-010-0380-4

Sébastien Stillittano, Vincent Girondel, Alice Caplier, Lip contour segmentation and tracking compliant with lip-reading application constraints Machine Vision and Applications. ,vol. 24, pp. 1- 18 ,(2013) , 10.1007/S00138-012-0445-1

10.

Tadas Baltrusaitis, Peter Robinson, Louis-Philippe Morency, Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild international conference on computer vision. pp. 354- 361 ,(2013) , 10.1109/ICCVW.2013.54

Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

来源期刊

我的账户

Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

来源期刊

相似文章 4

Automatic Detection of the Temporal Segmentation of Hand Movements in British English Cued Speech.

Re-Synchronization Using the Hand Preceding Model for Multi-Modal Fusion in Automatic Continuous Cued Speech Recognition

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition.

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition

我的账户