EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks

作者: Kun Zhan , Yi Yang , Guoliang Kang , Xuanyi Dong

DOI:

关键词: Word error rateComputer scienceConvolutional neural networkFeature (machine learning)AlgorithmRectifier (neural networks)Artificial neural networkLayer (object-oriented design)Artificial intelligenceConvolutionBlocking (statistics)

摘要: For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied with each layer. Although ReLU can ease the network training to an extent, character of blocking negative values may suppress propagation useful information and leads difficulty optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking layers nonlinear activations is hard approximate intrinsic linear transformations between feature representations. In this paper, we investigate effect erasing ReLUs certain apply it various representative architectures following deterministic rules. It optimization improve generalization performance for CNN models. We find two key factors being essential improvement: 1) location where should be erased inside basic module; 2) proportion modules erase ReLU; show that last layer all in usually yields improved performance. In experiments, our approach successfully improves report results on SVHN, CIFAR-10/100, ImageNet. achieve competitive single-model CIFAR-100 16.53% error rate compared state-of-the-art.

参考文章(25)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification international conference on computer vision. pp. 1026- 1034 ,(2015) , 10.1109/ICCV.2015.123
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Shuicheng Yan, Qiang Chen, Min Lin, Network In Network arXiv: Neural and Evolutionary Computing. ,(2013)
Matthew D. Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks european conference on computer vision. pp. 818- 833 ,(2014) , 10.1007/978-3-319-10590-1_53
Tianqi Chen, Naiyan Wang, Mu Li, Bing Xu, Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv: Learning. ,(2015)
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y
Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, Shih-En Wei, Convolutional Pose Machines arXiv: Computer Vision and Pattern Recognition. ,(2016)
Christian Szegedy, Vincent Vanhoucke, Alexander A. Alemi, Sergey Ioffe, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning computer vision and pattern recognition. ,(2016)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Identity Mappings in Deep Residual Networks Computer Vision – ECCV 2016. pp. 630- 645 ,(2016) , 10.1007/978-3-319-46493-0_38
Nikos Komodakis, Sergey Zagoruyko, Wide Residual Networks arXiv: Computer Vision and Pattern Recognition. ,(2016)