Contextual Action Recognition with R*CNN

作者: Georgia Gkioxari , Jitendra Malik , Ross Girshick

DOI:

关键词:

摘要: There are multiple cues in an image which reveal what action a person is performing. For example, jogger has pose that characteristic for jogging, but the scene (e.g. road, trail) and presence of other joggers can be additional source information. In this work, we exploit simple observation actions accompanied by contextual to build strong recognition system. We adapt RCNN use more than one region classification while still maintaining ability localize action. call our system R*CNN. The action-specific models feature maps trained jointly, allowing specific representations emerge. R*CNN achieves 90.2% mean AP on PASAL VOC Action dataset, outperforming all approaches field significant margin. Last, show not limited recognition. particular, also used tackle fine-grained tasks such as attribute classification. validate claim reporting state-of-the-art performance Berkeley Attributes People dataset.

参考文章(28)
Georgia Gkioxari, Ross Girshick, Jitendra Malik, Actions and Attributes from Wholes and Parts 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2470- 2478 ,(2015) , 10.1109/ICCV.2015.284
Maxime Oquab, Maxime Oquab, Léon Bottou, Josef Sivic, Ivan Laptev, Weakly supervised object recognition with convolutional neural networks ,(2014)
Georgia Gkioxari, Jitendra Malik, Finding action tubes computer vision and pattern recognition. pp. 759- 768 ,(2015) , 10.1109/CVPR.2015.7298676
Bangpeng Yao, Aditya Khosla, Li Fei-Fei, Combining randomization and discrimination for fine-grained image categorization computer vision and pattern recognition. pp. 1577- 1584 ,(2011) , 10.1109/CVPR.2011.5995368
Irving Biederman, Robert J. Mezzanotte, Jan C. Rabinowitz, Scene Perception" Detecting and Judging Objects Undergoing Relational Violations Cognitive Psychology. ,vol. 14, pp. 143- 177 ,(1982) , 10.1016/0010-0285(82)90007-X
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, Bernt Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis computer vision and pattern recognition. pp. 3686- 3693 ,(2014) , 10.1109/CVPR.2014.471
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, Selective Search for Object Recognition International Journal of Computer Vision. ,vol. 104, pp. 154- 171 ,(2013) , 10.1007/S11263-013-0620-5
Heng Wang, Cordelia Schmid, Action Recognition with Improved Trajectories international conference on computer vision. pp. 3551- 3558 ,(2013) , 10.1109/ICCV.2013.441
A. Prest, C. Schmid, V. Ferrari, Weakly Supervised Learning of Interactions between Humans and Objects IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 34, pp. 601- 614 ,(2012) , 10.1109/TPAMI.2011.158
Minh Hoai, Lubor Ladicky, Andrew Zisserman, Action Recognition From Weak Alignment of Body Parts. british machine vision conference. ,(2014) , 10.5244/C.28.86