作者: Dan Xu , Wanli Ouyang , Xiaogang Wang , Nicu Sebe
关键词:
摘要: Depth estimation and scene parsing are two particularly important tasks in visual understanding. In this paper we tackle the problem of simultaneous depth a joint CNN. The task can be typically treated as deep multi-task learning [42]. Different from previous methods directly optimizing multiple given input training data, proposes novel guided prediction-and-distillation network (PAD-Net), which first predicts set intermediate auxiliary ranging low level to high level, then predictions these utilized multi-modal via our proposed distillation modules for final tasks. During learning, not only act supervision more robust representations but also provide rich information improving Extensive experiments conducted on challenging datasets (i.e. NYUD-v2 Cityscapes) both tasks, demonstrating effectiveness approach.