作者: Deepti Ghadiyaram , Vignesh Ramanathan , Dhruv Mahajan , Zhenheng Yang , Ram Nevatia
DOI:
关键词: Object detection 、 Computer science 、 Object Class 、 Leverage (statistics) 、 Computer vision 、 Minimum bounding box 、 Artificial intelligence
摘要: Weakly supervised object detection aims at reducing the amount of supervision required to train models. Such models are traditionally learned from images/videos labelled only with class and not bounding box. In our work, we try leverage labels but also action associated data. We show that depicted in image/video can provide strong cues about location object. learn a spatial prior for dependent on (e.g. "ball" is closer "leg person" "kicking ball"), incorporate this simultaneously joint classification model. conducted experiments both video datasets image evaluate performance weakly Our approach outperformed current state-of-the-art (SOTA) method by more than 6% mAP Charades dataset.