Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation

作者: Sina Honari , Christopher Pal , Jason Yosinski , Pascal Vincent

DOI:

关键词:

摘要: Deep neural networks with alternating convolutional, max-pooling and decimation layers are widely used in state of the art architectures for computer vision. Max-pooling purposefully discards precise spatial information order to create features that more robust, typically organized as lower resolution feature maps. On some tasks, such whole-image classification, derived well suited; however, tasks requiring localization, pixel level prediction segmentation, destroys exactly required perform well. Precise localization may be preserved by shallow convnets without pooling but at expense robustness. Can we have our max-pooled multi-layered cake eat it too? Several papers proposed summation concatenation based methods combining upsampled coarse, abstract finer produce robust predictions. Here introduce another model --- dubbed Recombinator Networks where coarse inform early their formation can make use several computation deciding how features. The is trained once, end-to-end performs better than summation-based architectures, reducing error from previous on two facial keypoint datasets, AFW AFLW, 30\% beating current state-of-the-art 300W using extra data. We improve performance even further adding a denoising novel convnet formulation.

参考文章(35)
Brandon M. Smith, Li Zhang, Collaborative Facial Landmark Localization for Transferring Annotations Across Datasets Computer Vision – ECCV 2014. pp. 78- 93 ,(2014) , 10.1007/978-3-319-10599-4_6
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio, None, Theano: new features and speed improvements arXiv: Symbolic Computation. ,(2012)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Marwan Mattar, Tamara Berg, Gary B. Huang, Eric Learned-Miller, Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments Workshop on Faces in 'Real-Life' Images: Detection, Alignment, and Recognition. ,(2008)
Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang, Learning Deep Representation for Face Alignment with Auxiliary Attributes IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 38, pp. 918- 930 ,(2016) , 10.1109/TPAMI.2015.2469286
Hod Lipson, Jeff Clune, Thomas J. Fuchs, Jason Yosinski, Anh Mai Nguyen, Understanding Neural Networks Through Deep Visualization arXiv: Computer Vision and Pattern Recognition. ,(2015)
Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang, Facial Landmark Detection by Deep Multi-task Learning Computer Vision – ECCV 2014. pp. 94- 108 ,(2014) , 10.1007/978-3-319-10599-4_7
Philipp Fischer, Thomas Brox, None, U-Net: Convolutional Networks for Biomedical Image Segmentation medical image computing and computer assisted intervention. pp. 234- 241 ,(2015) , 10.1007/978-3-319-24574-4_28
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
Georgios Tzimiropoulos, Project-Out Cascaded Regression with an application to face alignment computer vision and pattern recognition. pp. 3659- 3667 ,(2015) , 10.1109/CVPR.2015.7298989