SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

作者: Vijay Badrinarayanan , Alex Kendall , Roberto Cipolla

DOI:

关键词:

摘要: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable engine consists of an encoder network, corresponding decoder followed by classification layer. The the is topologically identical to 13 layers in VGG16 network. role map low resolution feature maps full input classification. novelty SegNet lies manner which upsamples its lower map(s). Specifically, uses pooling indices computed max-pooling step perform non-linear upsampling. eliminates need learning upsample. upsampled are sparse then convolved with filters produce dense maps. compare our proposed widely adopted FCN also well known DeepLab-LargeFOV, DeconvNet architectures. comparison reveals memory versus accuracy trade-off involved achieving good performance. SegNet was primarily motivated scene understanding applications. Hence, it designed be efficient both terms computational time during inference. It significantly smaller number parameters than other competing performed controlled benchmark architectures on road scenes SUN RGB-D indoor tasks. show that provides performance competitive inference more memory-wise as compared provide Caffe implementation web demo at this http URL.

参考文章(55)
C. Lawrence Zitnick, Piotr Dollár, Edge Boxes: Locating Object Proposals from Edges Computer Vision – ECCV 2014. pp. 391- 405 ,(2014) , 10.1007/978-3-319-10602-1_26
Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution european conference on computer vision. pp. 184- 199 ,(2014) , 10.1007/978-3-319-10593-2_13
Léon Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent Proceedings of COMPSTAT'2010. pp. 177- 186 ,(2010) , 10.1007/978-3-7908-2604-3_16
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images Computer Vision – ECCV 2012. pp. 746- 760 ,(2012) , 10.1007/978-3-642-33715-4_54
Nico Höft, Hannes Schulz, Sven Behnke, Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). pp. 80- 85 ,(2014) , 10.1007/978-3-319-11206-0_9
Vijay Badrinarayanan, Roberto Cipolla, Ankur Handa, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling computer vision and pattern recognition. ,(2015)
Richard Socher, Andrew Y. Ng, Cliff C. Lin, Chris Manning, Parsing Natural Scenes and Natural Language with Recursive Neural Networks international conference on machine learning. pp. 129- 136 ,(2011)
Alan L. Yuille, Liang-Chieh Chen, Kevin Murphy, George Papandreou, Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation arXiv: Computer Vision and Pattern Recognition. ,(2015)
Ronan Collobert, Pedro Pinheiro, Recurrent Convolutional Neural Networks for Scene Labeling international conference on machine learning. pp. 82- 90 ,(2014)
Ľubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, Philip H. S. Torr, What, Where and How Many? Combining Object Detectors and CRFs Computer Vision – ECCV 2010. pp. 424- 437 ,(2010) , 10.1007/978-3-642-15561-1_31