Automatic visual speech segmentation

作者: Hamed Talea , Khashayar Yaghmaie

DOI: 10.1109/ICCSN.2011.6014877

关键词: Feature extractionAcoustic modelArtificial intelligenceAudio miningSpeech segmentationSyllableSpeech processingImage segmentationComputer scienceVoice activity detectionPattern recognitionSpeech recognition

摘要: Speech recognition techniques which rely on audio features of speech degrade in performance noisy environments. Visual Recognition helps this by incorporating a visual signal into the process. The automatic (ASR) system can be significantly enhanced with additional information from elements such as movement lips, tongue, and teeth. This paper introduces combined method for lip region extraction mouth area estimation, is then used to develop technique segmentation. accuracy verified applying it syllable boundary separation following vowel segmentation multi words phrases.

参考文章(19)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
David M. W. Powers, Trent W. Lewis, Audio-Visual Speech Recognition using Red Exclusion and Neural Networks. Journal of Research and Practice in Information Technology. ,vol. 35, pp. 41- 64 ,(2003)
A. Adjoudani, C. Benoît, On the Integration of Auditory and Visual Parameters in an HMM-based ASR Springer, Berlin, Heidelberg. pp. 461- 471 ,(1996) , 10.1007/978-3-662-13015-5_35
Linda G. Shapiro, Robert M. Haralock, Computer and Robot Vision Addison-Wesley Longman Publishing Co., Inc.. ,(1991)
M.W. Mak, W.G. Allen, Lip-motion analysis for speech segmentation in noise Speech Communication. ,vol. 14, pp. 279- 296 ,(1994) , 10.1016/0167-6393(94)90067-1
Tsuhan Chen, R.R. Rao, Audio-visual integration in multimodal communication Proceedings of the IEEE. ,vol. 86, pp. 837- 852 ,(1998) , 10.1109/5.664274
Kathleen E. Finn, Allen A. Montgomery, Automatic optically-based recognition of speech Pattern Recognition Letters. ,vol. 8, pp. 159- 164 ,(1988) , 10.1016/0167-8655(88)90094-3
Carol A. Fowler, Segmentation of coarticulated speech in perception Perception & Psychophysics. ,vol. 36, pp. 359- 368 ,(1984) , 10.3758/BF03202790
Vahideh Sadat Sadeghi, Khashayar Yaghmaie, Vowel Recognition using Neural Networks ,(2006)