Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

作者: Alex Kendall , Roberto Cipolla , Yarin Gal

DOI:

关键词:

摘要: Numerous deep learning applications benefit from multi-task with multiple regression and classification objectives. In this paper we make the observation that performance of such systems is strongly dependent on relative weighting between each task's loss. Tuning these weights by hand a difficult expensive process, making prohibitive in practice. We propose principled approach to which weighs loss functions considering homoscedastic uncertainty task. This allows us simultaneously learn various quantities different units or scales both settings. demonstrate our model per-pixel depth regression, semantic instance segmentation monocular input image. Perhaps surprisingly, show can weightings outperform separate models trained individually

参考文章(44)
Alex Kendall, Roberto Cipolla, Matthew Grimes, Convolutional networks for real-time 6-DOF camera relocalization. ,(2015)
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
David Eigen, Rob Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2650- 2658 ,(2015) , 10.1109/ICCV.2015.304
Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for object segmentation and fine-grained localization computer vision and pattern recognition. pp. 447- 456 ,(2015) , 10.1109/CVPR.2015.7298642
Bastian Leibe, Aleš Leonardis, Bernt Schiele, Robust Object Detection with Interleaved Categorization and Segmentation International Journal of Computer Vision. ,vol. 77, pp. 259- 289 ,(2008) , 10.1007/S11263-007-0095-3
Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong, Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers international conference on acoustics, speech, and signal processing. pp. 7304- 7308 ,(2013) , 10.1109/ICASSP.2013.6639081
D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 603- 619 ,(2002) , 10.1109/34.1000236
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation computer vision and pattern recognition. pp. 580- 587 ,(2014) , 10.1109/CVPR.2014.81
Ronan Collobert, Jason Weston, A unified architecture for natural language processing Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 160- 167 ,(2008) , 10.1145/1390156.1390177
H. Hirschmuller, Stereo Processing by Semiglobal Matching and Mutual Information IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 30, pp. 328- 341 ,(2008) , 10.1109/TPAMI.2007.1166