Confusion modelling for automated lip-reading using weighted finite-state transducers.

作者: Barry-John Theobald , Dominic Howell , Stephen J. Cox

DOI:

关键词:

摘要: Automated lip-reading involves recognising speech from only the visual signal. The accuracy of current state-ofthe-art systems is significantly lower than that obtained by acoustic recognisers. These poor results are most likely due to lack information about production available in signal: for example, it impossible discriminate voiced and unvoiced sounds, or many places articulation, signals. Our approach this problem regard signal as having been produced a speaker who has reduced phonemic repertoire attempt compensate this. In respect, similar dysarthric speech, which control over their articulators, leading them speak with distorted set phonemes. previous work, we found use weighted finite-state transducers improved recognition performance on considerably. paper, report applying technique lip-reading. works, but our initial not good those using conventional approach, discuss why might be so what prospects future investigation are.

参考文章(14)
Mehryar Mohri, Weighted Finite-State Transducer Algorithms. An Overview Formal Languages and Applications. pp. 551- 563 ,(2004) , 10.1007/978-3-540-39886-8_29
Pamela L. Jackson, The Theoretical Minimal Unit for Visual Speech Perception: Visemes and Coarticulation. Volta Review. ,vol. 90, pp. 99- 115 ,(1988)
Yuxuan Lan, Richard W. Harvey, Barry-John Theobald, Jacob L. Newman, Stephen J. Cox, The Challenge of Multispeaker Lip-Reading AVSP. pp. 179- 184 ,(2008)
Omar Caballero Morales, Stephen Cox, Application of Weighted Finite-State Transducers to Improve Recognition Accuracy for Dysarthric Speech conference of the international speech communication association. pp. 1761- 1764 ,(2008)
Johan Schalkwyk, Wojciech Skut, Mehryar Mohri, Cyril Allauzen, Michael Riley, OpenFst: a general and efficient weighted finite-state transducer library international conference on implementation and application of automata. pp. 11- 23 ,(2007) , 10.1007/978-3-540-76336-9_3
Yuxuan Lan, Richard W. Harvey, Eng-Jon Ong, Richard Bowden, Barry-John Theobald, Comparing Visual Features for Lipreading AVSP. pp. 102- 106 ,(2009)
Mehryar Mohri, Fernando Pereira, Michael Riley, Weighted finite-state transducers in speech recognition Computer Speech & Language. ,vol. 16, pp. 69- 88 ,(2002) , 10.1006/CSLA.2001.0184
Cletus G. Fisher, Confusions Among Visually Perceived Consonants Journal of Speech and Hearing Research. ,vol. 11, pp. 796- 804 ,(1968) , 10.1044/JSHR.1104.796
I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, R. Harvey, Extraction of visual features for lipreading IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 198- 213 ,(2002) , 10.1109/34.982900
Mehryar Mohri, Finite-state transducers in language and speech processing Computational Linguistics. ,vol. 23, pp. 269- 311 ,(1997)