Semantic Model Vectors for Complex Video Event Recognition

作者: Michele Merler , Bert Huang , Lexing Xie , Gang Hua , Apostol Natsev

DOI: 10.1109/TMM.2011.2168948

关键词:

摘要: We propose semantic model vectors, an intermediate level representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such those from YouTube. The vectors are extracted using set of discriminative classifiers, each being ensemble SVM models trained thousands labeled web images, total 280 generic concepts. Our study reveals that the proposed representation outperforms-and is complementary to-other low-level visual descriptors video event modeling. hence present end-to-end detection system, which combines with other static or dynamic descriptors, at frame, segment, full clip level. perform comprehensive empirical on 2010 TRECVID Multimedia Event Detection task (http://www.nist.gov/itl/iad/mig/med10.cfm), validates not only best individual descriptor, outperforming state-of-the-art global local features well spatio-temporal HOG HOF but also most compact. early late feature fusion across various approaches, leading to 15% performance boost overall system 0.46 mean average precision. In order promote further research this direction, we made our MED publicly available community use (http://www1.cs.columbia.edu/~mmerler/SMV.html).

参考文章(61)
Qiang Yang, Derek Hao Hu, Jie Yin, Spatio-temporal event detection using dynamic conditional random fields international joint conference on artificial intelligence. ,vol. 2009, pp. 1321- 1326 ,(2009)
Nazli Ikizler-Cinbis, Stan Sclaroff, Object, scene and actions: combining multiple features for human action recognition european conference on computer vision. pp. 494- 507 ,(2010) , 10.1007/978-3-642-15549-9_36
Juan Carlos Niebles, Bohyung Han, Andras Ferencz, Li Fei-Fei, Extracting Moving People from Internet Videos european conference on computer vision. pp. 527- 540 ,(2008) , 10.1007/978-3-540-88693-8_39
Monique Thonnat, François Bremond, Nicolas Maillot, Van-Thinh Vu, Ontologies For Video Events INRIA. ,(2004)
Yongzhen Huang, Kaiqi Huang, Tieniu Tan, Dacheng Tao, A novel visual organization based on topological perception asian conference on computer vision. pp. 180- 189 ,(2009) , 10.1007/978-3-642-12307-8_17
Aude Oliva, Antonio Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope International Journal of Computer Vision. ,vol. 42, pp. 145- 175 ,(2001) , 10.1023/A:1011139631724
Alan F. Smeaton, Paul Over, Wessel Kraaij, High-level feature detection from video in TRECVid: a 5-year retrospective of achievements Book chapter in Multimedia Content Analysis, Theory and Appl. pp. 1- 24 ,(2009) , 10.1007/978-0-387-76569-3_6
Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, Alexander C. Loui, Consumer video understanding: a benchmark database and an evaluation of human and machine performance international conference on multimedia retrieval. pp. 29- ,(2011) , 10.1145/1991996.1992025
Karthir Prabhakar, Sangmin Oh, Ping Wang, Gregory D. Abowd, James M. Rehg, Temporal causality for the analysis of visual events computer vision and pattern recognition. pp. 1967- 1974 ,(2010) , 10.1109/CVPR.2010.5539871