作者: Shiguang Shan , Xilin Chen , Shuang Yang , Keyu Long , Mingmin Yang
DOI:
关键词:
摘要: Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as lipreading, which has received increasing interest recent years. We present a naturally-distributed large-scale benchmark lip reading wild, named LRW-1000, contains 1,000 classes with 718,018 samples from more than 2,000 individual speakers. Each class corresponds to syllables Mandarin word composed one or Chinese characters. To best our knowledge, it is currently largest word-level lipreading dataset and only public dataset. This aims at covering "natural" variability over different modes imaging conditions incorporate challenges encountered practical applications. It shown large variation aspects, including number each class, video resolution, lighting conditions, speakers' attributes such pose, age, gender, make-up. Besides providing detailed description its collection pipeline, evaluate typical popular methods perform thorough analysis results aspects. The demonstrate consistency dataset, may open up new promising directions future work.