Going Deeper into First-Person Activity Recognition

作者: Minghuang Ma , Haoqi Fan , Kris M. Kitani

DOI: 10.1109/CVPR.2016.209

关键词:

摘要: We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent has shown that features such as hand appearance, object attributes, local motion and camera ego-motion are important characterizing first-person actions. To integrate these framework, we propose a twin stream network architecture, where analyzes appearance information other information. Our encodes prior knowledge paradigm explicitly training to segment hands localize objects. By visualizing certain neuron activation our network, show proposed architecture naturally learns capture attributes hand-object configurations. extensive experiments benchmark datasets enables rates significantly outperform state-of-the-art techniques – an average 6:6% increase in accuracy over all datasets. Furthermore, learning recognize objects, actions activities jointly, performance individual tasks also 30% (actions) 14% (objects). include results ablative analysis highlight importance decisions.

参考文章(37)
Alireza Fathi, Xiaofeng Ren, James M. Rehg, Learning to recognize objects in egocentric activities computer vision and pattern recognition. pp. 3281- 3288 ,(2011) , 10.1109/CVPR.2011.5995444
Cheng Li, Kris M. Kitani, Pixel-Level Hand Detection in Ego-centric Videos computer vision and pattern recognition. pp. 3570- 3577 ,(2013) , 10.1109/CVPR.2013.458
Heng Wang, Alexander Kläser, Cordelia Schmid, Cheng-Lin Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition International Journal of Computer Vision. ,vol. 103, pp. 60- 79 ,(2013) , 10.1007/S11263-012-0594-8
Kris M. Kitani, Takahiro Okabe, Yoichi Sato, Akihiro Sugimoto, Fast unsupervised ego-action learning for first-person sports videos computer vision and pattern recognition. pp. 3241- 3248 ,(2011) , 10.1109/CVPR.2011.5995406
Heng Wang, Cordelia Schmid, Action Recognition with Improved Trajectories international conference on computer vision. pp. 3551- 3558 ,(2013) , 10.1109/ICCV.2013.441
Yong Jae Lee, J. Ghosh, K. Grauman, Discovering important people and objects for egocentric video summarization computer vision and pattern recognition. pp. 1346- 1353 ,(2012) , 10.1109/CVPR.2012.6247820
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, ImageNet: A large-scale hierarchical image database computer vision and pattern recognition. pp. 248- 255 ,(2009) , 10.1109/CVPR.2009.5206848
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan Yuille, The Role of Context for Object Detection and Semantic Segmentation in the Wild computer vision and pattern recognition. pp. 891- 898 ,(2014) , 10.1109/CVPR.2014.119
Cordelia Schmid, Cheng-Lin Liu, Action recognition by dense trajectories computer vision and pattern recognition. pp. 3169- 3176 ,(2011) , 10.1109/CVPR.2011.5995407
Alireza Fathi, Greg Mori, Action recognition by learning mid-level motion features computer vision and pattern recognition. pp. 1- 8 ,(2008) , 10.1109/CVPR.2008.4587735