Combining multiple sources of knowledge in deep CNNs for action recognition

作者: Eunbyung Park , Xufeng Han , Tamara L. Berg , Alexander C. Berg

DOI: 10.1109/WACV.2016.7477589

关键词:

摘要: Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies demonstrated improved performance by incorporating additional handcrafted features or fusing predictions from multiple CNNs. Usually, these combinations are implemented via concatenation averaging output scores several In this paper, we present new approaches combining different sources of knowledge in learning. First, propose amplification, where use an auxiliary, hand-crafted, (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN maps. Second, a multiplicative fusion method CNNs trained that robust amplifying suppressing the activations based their agreement. We test methods context action recognition information spatial temporal cues is useful, obtaining comparable with state-of-the-art outperform using only flow features.

参考文章(30)
Amir Roshan Zamir, Khurram Soomro, Mubarak Shah, UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild arXiv: Computer Vision and Pattern Recognition. ,(2012)
Ilya Sutskever, Geoffrey E. Hinton, James Martens, Generating Text with Recurrent Neural Networks international conference on machine learning. pp. 1017- 1024 ,(2011)
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik, Learning Rich Features from RGB-D Images for Object Detection and Segmentation european conference on computer vision. pp. 345- 360 ,(2014) , 10.1007/978-3-319-10584-0_23
Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, Shih-Fu Chang, Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 40, pp. 352- 364 ,(2018) , 10.1109/TPAMI.2017.2670560
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, Ruslan R. Salakhutdinov, Nitish Srivastava, Improving neural networks by preventing co-adaptation of feature detectors arXiv: Neural and Evolutionary Computing. ,(2012)
Zhenzhong Lan, Ming Lin, Xuanchong Li, Alexander G. Hauptmann, Bhiksha Raj, Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition computer vision and pattern recognition. pp. 204- 212 ,(2015) , 10.1109/CVPR.2015.7298616
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici, Beyond short snippets: Deep networks for video classification computer vision and pattern recognition. pp. 4694- 4702 ,(2015) , 10.1109/CVPR.2015.7299101
Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, Alexander C. Berg, MatchNet: Unifying feature and metric learning for patch-based matching computer vision and pattern recognition. pp. 3279- 3286 ,(2015) , 10.1109/CVPR.2015.7298948
Yair Movshovitz-Attias, Qian Yu, Martin C. Stumpe, Vinay Shet, Sacha Arnoud, Liron Yatziv, Ontological supervision for fine grained classification of Street View storefronts computer vision and pattern recognition. pp. 1693- 1702 ,(2015) , 10.1109/CVPR.2015.7298778