作者: Alessandro Prest , C. Leistner , J. Civera , C. Schmid , V. Ferrari
DOI: 10.1109/CVPR.2012.6248065
关键词:
摘要: Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object from real-world web videos known only to contain objects target class. We propose fully automatic pipeline that localizes in the class and learns detector it. The extracts candidate spatio-temporal tubes based motion segmentation then selects one tube per video jointly over all videos. To compare state art, we test our images, i.e., Pascal VOC 2007. observe frames extracted can differ significantly terms quality taken good camera. Thus, formulate as domain adaptation task. show training combination weakly using improves performance alone.