作者: Liang-Chieh Chen , George Papandreou , Iasonas Kokkinos , Kevin Murphy , Alan L. Yuille
DOI: 10.1109/TPAMI.2017.2699184
关键词: Pyramid 、 Computer vision 、 Conditional random field 、 Upsampling 、 Deep learning 、 CRFS 、 Computer science 、 Graphical model 、 Convolutional neural network 、 Pattern recognition 、 Artificial intelligence 、 Scale-space segmentation 、 Convolution 、 Test set
摘要: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , highlight convolution upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous allows us explicitly control resolution at which feature responses computed within Convolutional Neural Networks. It also effectively enlarge field view filters incorporate larger context without increasing number parameters amount computation. Second propose atrous spatial pyramid pooling (ASPP) robustly segment objects multiple scales. ASPP probes an incoming convolutional layer sampling rates effective fields-of-views, thus capturing well Third improve localization object boundaries by combining methods from DCNNs probabilistic graphical models. The commonly deployed combination max-pooling downsampling achieves invariance but has toll on accuracy. We overcome final DCNN fully connected Conditional Random Field (CRF), is both qualitatively quantitatively performance. Our proposed “DeepLab” system sets new state-of-art PASCAL VOC-2012 task, reaching 79.7 percent mIOU test set, advances results other datasets: PASCAL-Context, PASCAL-Person-Part, Cityscapes. All our code made publicly available online.