作者: Sina Honari , Christopher Pal , Jason Yosinski , Pascal Vincent
DOI:
关键词:
摘要: Deep neural networks with alternating convolutional, max-pooling and decimation layers are widely used in state of the art architectures for computer vision. Max-pooling purposefully discards precise spatial information order to create features that more robust, typically organized as lower resolution feature maps. On some tasks, such whole-image classification, derived well suited; however, tasks requiring localization, pixel level prediction segmentation, destroys exactly required perform well. Precise localization may be preserved by shallow convnets without pooling but at expense robustness. Can we have our max-pooled multi-layered cake eat it too? Several papers proposed summation concatenation based methods combining upsampled coarse, abstract finer produce robust predictions. Here introduce another model --- dubbed Recombinator Networks where coarse inform early their formation can make use several computation deciding how features. The is trained once, end-to-end performs better than summation-based architectures, reducing error from previous on two facial keypoint datasets, AFW AFLW, 30\% beating current state-of-the-art 300W using extra data. We improve performance even further adding a denoising novel convnet formulation.