Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

作者: Paolo Favaro , Hailin Jin , Simon Jenni

DOI:

关键词:

摘要: We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations an image. argue that generalization capability learned features depends what image neighborhood size is sufficient to discriminate different transformations: The larger required and more global statistics can describe. An accurate description allows better represent shape configuration objects their context, which ultimately generalizes new tasks such as object classification detection. This suggests criterion choose design transformations. Based this criterion, we transformation call limited context inpainting (LCI). inpaints patch conditioned only small rectangular pixel boundary (the context). Because information, inpainter learn match local statistics, but unlikely claim same be used justify performance rotations warping. Indeed, demonstrate experimentally LCI, warping rotations, yields with state art capabilities several datasets Pascal VOC, STL-10, CelebA, ImageNet. Remarkably, our trained achieve Places par through supervised ImageNet labels.

参考文章(63)
Xiaolong Wang, Abhinav Gupta, Unsupervised Learning of Visual Representations Using Videos 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2794- 2802 ,(2015) , 10.1109/ICCV.2015.320
Carl Doersch, Abhinav Gupta, Alexei A. Efros, Unsupervised Visual Representation Learning by Context Prediction international conference on computer vision. pp. 1422- 1430 ,(2015) , 10.1109/ICCV.2015.167
Ross Goroshin, Yann LeCun, Michaël Mathieu, Junbo Jake Zhao, Stacked What-Where Auto-encoders arXiv: Machine Learning. ,(2015)
Pulkit Agrawal, Joao Carreira, Jitendra Malik, Learning to See by Moving international conference on computer vision. pp. 37- 45 ,(2015) , 10.1109/ICCV.2015.13
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Ross Girshick, Fast R-CNN international conference on computer vision. pp. 1440- 1448 ,(2015) , 10.1109/ICCV.2015.169
Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang, Deep Learning Face Attributes in the Wild 2015 IEEE International Conference on Computer Vision (ICCV). pp. 3730- 3738 ,(2015) , 10.1109/ICCV.2015.425
Christian Szegedy, Sergey Ioffe, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift international conference on machine learning. ,vol. 1, pp. 448- 456 ,(2015)
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
, Generative Adversarial Nets neural information processing systems. ,vol. 27, pp. 2672- 2680 ,(2014) , 10.3156/JSOFT.29.5_177_2