Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition

作者: Shinsuke Okita , Yasue Mitsukura , Nozomu Hamada

DOI: 10.1109/SPC.2013.6735104

关键词:

摘要: For the purpose of automatic speech recognition and animation synthesis, speaker verification so on, there have been studies on `viseme'. Viseme is a visually identifiable unit utterance or equivalent in visual domain phoneme audio domain. The classification discrimination method visemes are still important topics. This paper focuses number units procedure Japanese visemes: We extend from 6 to 9 expanse word representation by their series, then propose hierarchical weighted using multiple discriminative analysis (MDA) enhance ability. In order verify discuss availability our proposals, experiments were conducted. From these results, validity proposed methods was confirmed.

参考文章(16)
J. Andrew Bangham, Richard Harvey, Iain Matthews, Stephen Cox, Nonlinear scale decomposition based features for visual speech recognition european signal processing conference. pp. 1- 4 ,(1998) , 10.5281/ZENODO.36896
Algirdas Pakstas, Robert Forchheimer, Igor S. Pandzic, MPEG-4 Facial Animation: The Standard,Implementation and Applications John Wiley & Sons, Inc.. ,(2002)
Tsuyoshi Miyazaki, Toyoshiro Nakashima, The Codification of Distinctive Mouth Shapes and the Expression Method of Data Concerning Changes in Mouth Shape when Uttering Japanese Ieej Transactions on Electronics, Information and Systems. ,vol. 129, pp. 2108- 2114 ,(2009) , 10.1541/IEEJEISS.129.2108
Juergen Luettin, Neil A. Thacker, Speechreading using Probabilistic Models Computer Vision and Image Understanding. ,vol. 65, pp. 163- 178 ,(1997) , 10.1006/CVIU.1996.0570
Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan, A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Ipsj Transactions on Computer Vision and Applications. ,vol. 2, pp. 25- 38 ,(2010) , 10.2197/IPSJTCVA.2.25
Mohammad Aghaahmadi, Mohammad Mahdi Dehshibi, Azam Bastanfard, Mahmood Fazlali, Clustering Persian viseme using phoneme subspace for developing visual speech application Multimedia Tools and Applications. ,vol. 65, pp. 521- 541 ,(2013) , 10.1007/S11042-012-1128-7
G. Potamianos, H.P. Graf, E. Cosatto, An image transform approach for HMM based automatic lipreading international conference on image processing. pp. 173- 177 ,(1998) , 10.1109/ICIP.1998.999008
Takeshi Saitoh, Kazutoshi Morishita, Ryosuke Konishi, Analysis of efficient lip reading method for various languages international conference on pattern recognition. pp. 1- 4 ,(2008) , 10.1109/ICPR.2008.4761049
H.E. Cetingul, Y. Yemez, Engin Erzin, A.M. Tekalp, Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading IEEE Transactions on Image Processing. ,vol. 15, pp. 2879- 2891 ,(2006) , 10.1109/TIP.2006.877528
Michiel Visser, Mannes Poel, Anton Nijholt, Classifying Visemes for Automatic Lipreading text speech and dialogue. pp. 349- 352 ,(1999) , 10.1007/3-540-48239-3_65