作者: Li Liu , Gang Feng , Denis Beautemps
DOI: 10.1109/ICASSP.2017.7953134
关键词: Pixel 、 Computer vision 、 Cued speech 、 Computer science 、 Luminance 、 Aperture 、 Feature extraction 、 Visualization 、 Artificial intelligence 、 Discrete cosine transform
摘要: In this paper, a novel automatic approach to extract the inner lips contour of speakers without using artifices is proposed. This method based on recent facial extraction model developed in computer vision, called Constrained Local Neural Field (CLNF), which provides 8 characteristic points (landmarks) defining contour. However, directly applied our visual data including Cued Speech (CS) data, CLNF failed about 50% cases. We propose Modified estimate original landmarks. A dynamic template first derivative smoothed luminance variation explored new model. gives precise estimation aperture for lips. It evaluated 4800 images three French speakers. The proposed corrects 95% errors and total RMSE one pixel (i.e. 0.05cm average) reached, instead four pixels CLNF.