PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

作者: Dan Xu , Wanli Ouyang , Xiaogang Wang , Nicu Sebe

DOI: 10.1109/CVPR.2018.00077

关键词:

摘要: Depth estimation and scene parsing are two particularly important tasks in visual understanding. In this paper we tackle the problem of simultaneous depth a joint CNN. The task can be typically treated as deep multi-task learning [42]. Different from previous methods directly optimizing multiple given input training data, proposes novel guided prediction-and-distillation network (PAD-Net), which first predicts set intermediate auxiliary ranging low level to high level, then predictions these utilized multi-modal via our proposed distillation modules for final tasks. During learning, not only act supervision more robust representations but also provide rich information improving Extensive experiments conducted on challenging datasets (i.e. NYUD-v2 Cityscapes) both tasks, demonstrating effectiveness approach.

参考文章(59)
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images Computer Vision – ECCV 2012. pp. 746- 760 ,(2012) , 10.1007/978-3-642-33715-4_54
Vijay Badrinarayanan, Roberto Cipolla, Ankur Handa, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling computer vision and pattern recognition. ,(2015)
Mingsheng Long, Jianmin Wang, Learning Multiple Tasks with Deep Relationship Networks. arXiv: Learning. ,(2015)
Ross Girshick, Jitendra Malik, Bharath Hariharan, Pablo Arbeláez, Simultaneous Detection and Segmentation european conference on computer vision. pp. 297- 312 ,(2014) , 10.1007/978-3-319-10584-0_20
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, None, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention international conference on machine learning. ,vol. 3, pp. 2048- 2057 ,(2015)
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik, Learning Rich Features from RGB-D Images for Object Detection and Segmentation european conference on computer vision. pp. 345- 360 ,(2014) , 10.1007/978-3-319-10584-0_23
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation international conference on computer vision. pp. 1520- 1528 ,(2015) , 10.1109/ICCV.2015.178
Thang Luong, Hieu Pham, Christopher D. Manning, Effective Approaches to Attention-based Neural Machine Translation empirical methods in natural language processing. pp. 1412- 1421 ,(2015) , 10.18653/V1/D15-1166
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965