Activity Driven Weakly Supervised Object Detection

作者: Deepti Ghadiyaram , Vignesh Ramanathan , Dhruv Mahajan , Zhenheng Yang , Ram Nevatia

DOI:

关键词: Object detectionComputer scienceObject ClassLeverage (statistics)Computer visionMinimum bounding boxArtificial intelligence

摘要: Weakly supervised object detection aims at reducing the amount of supervision required to train models. Such models are traditionally learned from images/videos labelled only with class and not bounding box. In our work, we try leverage labels but also action associated data. We show that depicted in image/video can provide strong cues about location object. learn a spatial prior for dependent on (e.g. "ball" is closer "leg person" "kicking ball"), incorporate this simultaneously joint classification model. conducted experiments both video datasets image evaluate performance weakly Our approach outperformed current state-of-the-art (SOTA) method by more than 6% mAP Charades dataset.

参考文章(52)
Xiaolong Wang, Abhinav Gupta, Unsupervised Learning of Visual Representations Using Videos 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2794- 2802 ,(2015) , 10.1109/ICCV.2015.320
Georgia Gkioxari, Ross Girshick, Jitendra Malik, Contextual Action Recognition with R*CNN international conference on computer vision. pp. 1080- 1088 ,(2015) , 10.1109/ICCV.2015.129
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Ross Girshick, Fast R-CNN international conference on computer vision. pp. 1440- 1448 ,(2015) , 10.1109/ICCV.2015.169
Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid, Unsupervised Object Discovery and Tracking in Video Collections 2015 IEEE International Conference on Computer Vision (ICCV). pp. 3173- 3181 ,(2015) , 10.1109/ICCV.2015.363
Christian Szegedy, Sergey Ioffe, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift international conference on machine learning. ,vol. 1, pp. 448- 456 ,(2015)
Georgia Gkioxari, Jitendra Malik, Finding action tubes computer vision and pattern recognition. pp. 759- 768 ,(2015) , 10.1109/CVPR.2015.7298676
Vincent Delaitre, Ivan Laptev, Josef Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations british machine vision conference. pp. 1- 11 ,(2010) , 10.5244/C.24.97
Bangpeng Yao, Li Fei-Fei, Modeling mutual context of object and human pose in human-object interaction activities computer vision and pattern recognition. pp. 17- 24 ,(2010) , 10.1109/CVPR.2010.5540235
Thomas Deselaers, Bogdan Alexe, Vittorio Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge International Journal of Computer Vision. ,vol. 100, pp. 275- 293 ,(2012) , 10.1007/S11263-012-0538-3