Mask R-CNN

作者: Georgia Gkioxari , Piotr Dollár , Kaiming He , Ross Girshick

DOI:

关键词: Image (mathematics)SegmentationObject (computer science)Computer visionObject detectionComputer scienceTask (computing)Overhead (computing)Minimum bounding boxCode (cryptography)Artificial intelligence

摘要: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating high-quality segmentation mask each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding branch predicting parallel with the existing bounding box recognition. is simple to train adds only small overhead running at 5 fps. Moreover, easy generalize other tasks, e.g., allowing us estimate human poses same framework. show top results all three tracks of COCO suite challenges, including segmentation, bounding-box detection, person keypoint detection. Without bells whistles, outperforms existing, single-model entries on every task, 2016 challenge winners. hope our effective will serve as solid baseline help ease future research instance-level Code has been made available at: this https URL

参考文章(38)
Koray Kavukcuoglu, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Spatial transformer networks neural information processing systems. ,vol. 28, pp. 2017- 2025 ,(2015)
Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for object segmentation and fine-grained localization computer vision and pattern recognition. pp. 447- 456 ,(2015) , 10.1109/CVPR.2015.7298642
Jan Hosang, Rodrigo Benenson, Piotr Dollar, Bernt Schiele, What Makes for Effective Detection Proposals IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 38, pp. 814- 830 ,(2016) , 10.1109/TPAMI.2015.2465908
Pablo Arbelaez, Jordi Pont-Tuset, Jon Barron, Ferran Marques, Jitendra Malik, Multiscale Combinatorial Grouping computer vision and pattern recognition. pp. 328- 335 ,(2014) , 10.1109/CVPR.2014.49
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, Bernt Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis computer vision and pattern recognition. pp. 3686- 3693 ,(2014) , 10.1109/CVPR.2014.471
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, Selective Search for Object Recognition International Journal of Computer Vision. ,vol. 104, pp. 154- 171 ,(2013) , 10.1007/S11263-013-0620-5
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, Lawrence D Jackel, None, Backpropagation applied to handwritten zip code recognition Neural Computation. ,vol. 1, pp. 541- 551 ,(1989) , 10.1162/NECO.1989.1.4.541
Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, Shih-En Wei, Convolutional Pose Machines arXiv: Computer Vision and Pattern Recognition. ,(2016)
Christian Szegedy, Vincent Vanhoucke, Alexander A. Alemi, Sergey Ioffe, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning computer vision and pattern recognition. ,(2016)
Jifeng Dai, Kaiming He, Jian Sun, Yi Li, R-FCN: Object Detection via Region-based Fully Convolutional Networks neural information processing systems. ,vol. 29, pp. 379- 387 ,(2016)