OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis

作者: Iryna Anina , Ziheng Zhou , Guoying Zhao , Matti Pietikainen

DOI: 10.1109/FG.2015.7163155

关键词:

摘要: Visual speech constitutes a large part of our nonrigid facial motion and contains important information that allows machines to interact with human users, for instance, through automatic visual recognition (VSR) speaker verification. One the major obstacles research non-rigid mouth analysis is absence suitable databases. Those available public either lack sufficient number speakers or utterances contain constrained view points, which limits their representativeness usefulness. This paper introduces newly collected multi-view audiovisual database analysis. It includes more than 50 uttering three types importantly, thousands videos simultaneously recorded by six cameras from five different views spanned between frontal profile views. Moreover, simple VSR system has been developed tested on provide some baseline performance.

参考文章(23)
Bowon Lee, Suketu Kamdar, Thomas S. Huang, Sarah Borys, Mark Hasegawa-Johnson, Ming Liu, Camille Goudeseune, AVICAR: audio-visual speech corpus in a car environment. conference of the international speech communication association. ,(2004)
J. Matas, K. Messer, J. Kittler, Gilbert Maître, Juergen Luettin, XM2VTSDB: The Extended M2VTS Database Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA'99). ,(1999)
Yuxuan Lan, Richard W. Harvey, Barry-John Theobald, Jacob L. Newman, Stephen J. Cox, The Challenge of Multispeaker Lip-Reading AVSP. pp. 179- 184 ,(2008)
Herbert Bay, Tinne Tuytelaars, Luc Van Gool, SURF: speeded up robust features european conference on computer vision. ,vol. 1, pp. 404- 417 ,(2006) , 10.1007/11744023_32
Martin Cooke, Jon Barker, Stuart Cunningham, Xu Shao, An audio-visual corpus for speech perception and automatic speech recognition Journal of the Acoustical Society of America. ,vol. 120, pp. 2421- 2424 ,(2006) , 10.1121/1.2229005
HARRY MCGURK, JOHN MACDONALD, Hearing lips and seeing voices Nature. ,vol. 264, pp. 746- 748 ,(1976) , 10.1038/264746A0
Victor Zue, Stephanie Seneff, James Glass, Speech database development at MIT: Timit and beyond Speech Communication. ,vol. 9, pp. 351- 356 ,(1990) , 10.1016/0167-6393(90)90010-7
Xiangxin Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild computer vision and pattern recognition. pp. 2879- 2886 ,(2012) , 10.1109/CVPR.2012.6248014
Ziheng Zhou, Guoying Zhao, Matti Pietikainen, Towards a practical lipreading system computer vision and pattern recognition. pp. 137- 144 ,(2011) , 10.1109/CVPR.2011.5995345
Xin Liu, Yiu-ming Cheung, Learning Multi-Boosted HMMs for Lip-Password Based Speaker Verification IEEE Transactions on Information Forensics and Security. ,vol. 9, pp. 233- 246 ,(2014) , 10.1109/TIFS.2013.2293025