Improving video activity recognition using object recognition and text mining

作者: Raymond J. Mooney , Tanvi S. Motwani

DOI: 10.3233/978-1-61499-098-7-600

关键词:

摘要: Recognizing activities in real-world videos is a challenging AI problem. We present novel combination of standard activity classification, object recognition, and text mining to learn effective recognizers without ever explicitly labeling training videos. cluster verbs used describe automatically discover classes produce labeled set. This data then train an classifier based on spatio-temporal features. Next, employed the correlations between these related objects. knowledge together with outputs off-the-shelf recognizer trained improved recognizer. Experiments corpus YouTube demonstrate effectiveness overall approach.

参考文章(34)
Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses language resources and evaluation. pp. 449- 454 ,(2006)
J.K. Aggarwal, Sangho Park, Human motion: modeling and recognition of actions and interactions international symposium on 3d data processing visualization and transmission. pp. 640- 647 ,(2004) , 10.1109/3DPVT.2004.75
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Dekang Lin, An Information-Theoretic Definition of Similarity international conference on machine learning. pp. 296- 304 ,(1998)
Timothee Cour, Chris Jordan, Eleni Miltsakaki, Ben Taskar, Movie/Script: Alignment and Parsing of Video and Text Transcription european conference on computer vision. pp. 158- 171 ,(2008) , 10.1007/978-3-540-88693-8_12
Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478
Adrien Gaidon, Marcin Marszalek, Cordelia Schmid, Mining visual actions from movies british machine vision conference. pp. 1- 11 ,(2009) , 10.5244/C.23.125
Ivan Laptev, On Space-Time Interest Points international conference on computer vision. ,vol. 64, pp. 107- 123 ,(2005) , 10.1007/S11263-005-1838-7