The Challenge of Multispeaker Lip-Reading

作者： Yuxuan Lan , Richard W. Harvey , Barry-John Theobald , Jacob L. Newman , Stephen J. Cox

DOI:

关键词:

摘要: In speech recognition, the problem of speaker variability has been well studied. Common approaches to dealing with it include normalising for a speaker's vocal tract length and learning linear transform that moves speaker-independent models closer new speaker. pure lip-reading (no audio) less Results are often presented based on speaker-dependent (single speaker) or multispeaker (speakers in test-set also training-set) data, situations limited use real applications. This paper shows danger not using different speakers trainingand test-sets. Firstly, we present classification results single-word database AVletters 2 which is high-definition version known database. By careful choice features, show possible performance visual-only be very close audio-only recognition single multi-speaker configurations. However, independent configuration, channel degrades dramatically. applying multidimensional scaling (MDS) both audio features visual demonstrate when compared MFCCs commonly used have inherently small variation within across all classes spoken. highly sensitive identity speaker, whereas relatively invariant.

uea.ac.uk 本地加速

uni-trier.de 本地加速

isca-speech.org 本地加速

surrey.ac.uk PDF 下载加速

uea.ac.uk PDF 下载加速

isca-speech.org PDF 下载加速

参考文章(27)

Patrick J. Lucey, Gerasimos Potamianos, Sridha Sridharan, A Unified Approach to Multi-Pose Audio-Visual ASR Faculty of Built Environment and Engineering; Information Security Institute. ,(2007)

T. Kohonen, The self-organizing map Proceedings of the IEEE. ,vol. 78, pp. 1464- 1480 ,(1990) , 10.1109/5.58325

Kathleen E. Finn, Allen A. Montgomery, Automatic optically-based recognition of speech Pattern Recognition Letters. ,vol. 8, pp. 159- 164 ,(1988) , 10.1016/0167-8655(88)90094-3

Brian E. Walden, Robert A. Prosek, Allen A. Montgomery, Charlene K. Scherr, Carla J. Jones, Effects of Training on the Visual Recognition of Consonants Journal of Speech and Hearing Research. ,vol. 20, pp. 130- 145 ,(1977) , 10.1044/JSHR.2001.130

J. Richard Franks, Joan Kimble, The Confusion of English Consonant Clusters in Lipreading Journal of Speech and Hearing Research. ,vol. 15, pp. 474- 482 ,(1972) , 10.1044/JSHR.1503.474

Barry J. Theobald, Richard Harvey, Stephen J. Cox, Colin Lewis, Gari P. Owen, Lip-reading enhancement for law enforcement Optics and Photonics for Counterterrorism and Crime Fighting II. ,vol. 6402, pp. 640205- ,(2006) , 10.1117/12.689960

Mary F. Woodward, Carroll G. Barber, Phoneme Perception in Lipreading Journal of Speech and Hearing Research. ,vol. 3, pp. 212- 222 ,(1960) , 10.1044/JSHR.0303.212

Juergen Luettin, Neil A. Thacker, Speechreading using Probabilistic Models Computer Vision and Image Understanding. ,vol. 65, pp. 163- 178 ,(1997) , 10.1006/CVIU.1996.0570

Elmer Owens, Barbara Blazek, Visemes observed by hearing-impaired and normal-hearing adult viewers. Journal of Speech Language and Hearing Research. ,vol. 28, pp. 381- 393 ,(1985) , 10.1044/JSHR.2803.381

10.

Cletus G. Fisher, Confusions Among Visually Perceived Consonants Journal of Speech and Hearing Research. ,vol. 11, pp. 796- 804 ,(1968) , 10.1044/JSHR.1104.796

The Challenge of Multispeaker Lip-Reading

来源期刊

我的账户

The Challenge of Multispeaker Lip-Reading

来源期刊

相似文章 10

我的账户