Experiment with asynchrony in multimodal speech communication

作者: Jonas Beskow , Björn Granström , Marie Molander

DOI:

关键词: Speech communicationTelephone communicationMathematicsAnalysis of varianceSpeech recognitionHearing lossIntelligibility (communication)PerceptionSpeech technologyNegative number

摘要: The purpose of this study was to examine the delay effects in audiovisual speech perception for natural and synthetic faces. main focus on SYNFACE project, development a telephone communication aid hearing impaired persons. In experiments, consequence temporal displacement audio relation visual channel investigated. with vocoder-like distortion simulate loss. Twelve different experimental conditions were presented subjects two separate sessions. face tested audio-leading (negative numbers) as well audio-lagging (positive stimuli, whereas only stimuli. Asynchronies examined 50, 175 300 ms. addition, reference examined: synchrony audio-only. Tests ANOVA including both faces revealed that neither -300 ms nor significantly better than audio-only condition, which implies final product would not be beneficial delays magnitude. -50 however, did show lower intelligibility scores synchronous condition. Unfortunately, measured present prototype is greater this. It would, therefore, interesting investigate asynchronies between -175 see exactly where drops. further showed effect type non-significant, indicating quality close face. Experiment asynchrony multimodal v tolerance larger delays, verified by significant decrease performance late at +300 (the corresponding ms). Even gain found +50 condition compared synchrony. However, significant, statistical analysis within interval [-50, +175] have small spoken message

参考文章(15)
Jonas Beskow, Eva Agelfors, Tobias Öhman, Martin Dahlquist, Karl-Erik Spens, Magnus Lundeberg, Björn Granström, Synthetic faces as a lipreading support. conference of the international speech communication association. ,(1998)
Kunov H, Abel Sm, Pandey Pc, Disruptive effects of auditory signal delay on speech perception with lipreading. The Journal of auditory research. ,vol. 26, pp. 27- 41 ,(1986)
Jonas Beskow, Talking Heads - Models and Applications for Multimodal Speech Synthesis Institutionen för talöverföring och musikakustik. ,(2003)
L.E. Bernstein, C. Benoit, For speech perception by humans or machines, three senses are better than one international conference on spoken language processing. ,vol. 3, pp. 1477- 1480 ,(1996) , 10.1109/ICSLP.1996.607895
Norman F Dixon, Lydia Spitz, The Detection of Auditory Visual Desynchrony Perception. ,vol. 9, pp. 719- 721 ,(1980) , 10.1068/P090719
John C. Tang, Ellen Isaacs, Why Do Users Like Video? Studies of Multimedia-Supported Collaboration conference on computer supported cooperative work. ,vol. 1, pp. 163- 196 ,(1992) , 10.1007/BF00752437
Ruth Campbell, Barbara Dodd, Hearing by Eye Quarterly Journal of Experimental Psychology. ,vol. 32, pp. 85- 99 ,(1980) , 10.1080/00335558008248235