Tree-structured reinforcement learning for sequential object localization

作者: Xiaodan Liang , Shuicheng Yan , Jiashi Feng , Wen Feng Lu , Zequn Jie

DOI:

关键词: Machine learningPascal (programming language)TraverseMathematicsReinforcement learningPerceptionInterdependenceFeed forwardData miningObject-oriented designArtificial intelligence

摘要: Existing object proposal algorithms usually search for possible regions over multiple locations and scales separately, which ignore the interdependency among different objects deviate from human perception procedure. To incorporate global between into localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially by fully exploiting both current observation historical paths. The Tree-RL learns searching policies through maximizing long-term reward that reflects localization accuracies all objects. Starting with taking entire image as a proposal, allows agent discover via tree-structured traversing scheme. Allowing near-optimal policies, offers more diversity in paths is able find single feedforward pass. Therefore, can better cover various quite appealing context of proposal. Experiments on PASCAL VOC 2007 2012 validate effectiveness Tree-RL, achieve comparable recalls much fewer candidate windows.

参考文章(24)
Cristian Sminchisescu, Stefan Mathe, Multiple Instance Reinforcement Learning for Efficient Weakly-Supervised Detection in Images arXiv: Computer Vision and Pattern Recognition. ,(2014)
Ronan Collobert, Clément Farabet, Koray Kavukcuoglu, Torch7: A Matlab-like Environment for Machine Learning neural information processing systems. ,(2011)
Volodymyr Mnih, Koray Kavukcuoglu, Jimmy Ba, Multiple Object Recognition with Visual Attention arXiv: Learning. ,(2014)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
Abel Gonzalez-Garcia, Alexander Vezhnevets, Vittorio Ferrari, An active search strategy for efficient object class detection computer vision and pattern recognition. pp. 3022- 3031 ,(2015) , 10.1109/CVPR.2015.7298921
Jan Hosang, Rodrigo Benenson, Piotr Dollar, Bernt Schiele, What Makes for Effective Detection Proposals IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 38, pp. 814- 830 ,(2016) , 10.1109/TPAMI.2015.2465908
Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr, BING: Binarized Normed Gradients for Objectness Estimation at 300fps computer vision and pattern recognition. ,vol. 5, pp. 3286- 3293 ,(2014) , 10.1109/CVPR.2014.414
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman, The Pascal Visual Object Classes (VOC) Challenge International Journal of Computer Vision. ,vol. 88, pp. 303- 338 ,(2010) , 10.1007/S11263-009-0275-4