作者: Minghuang Ma , Haoqi Fan , Kris M. Kitani
关键词:
摘要: We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent has shown that features such as hand appearance, object attributes, local motion and camera ego-motion are important characterizing first-person actions. To integrate these framework, we propose a twin stream network architecture, where analyzes appearance information other information. Our encodes prior knowledge paradigm explicitly training to segment hands localize objects. By visualizing certain neuron activation our network, show proposed architecture naturally learns capture attributes hand-object configurations. extensive experiments benchmark datasets enables rates significantly outperform state-of-the-art techniques – an average 6:6% increase in accuracy over all datasets. Furthermore, learning recognize objects, actions activities jointly, performance individual tasks also 30% (actions) 14% (objects). include results ablative analysis highlight importance decisions.