作者: Kun Zhan , Yi Yang , Guoliang Kang , Xuanyi Dong
DOI:
关键词: Word error rate 、 Computer science 、 Convolutional neural network 、 Feature (machine learning) 、 Algorithm 、 Rectifier (neural networks) 、 Artificial neural network 、 Layer (object-oriented design) 、 Artificial intelligence 、 Convolution 、 Blocking (statistics)
摘要: For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied with each layer. Although ReLU can ease the network training to an extent, character of blocking negative values may suppress propagation useful information and leads difficulty optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking layers nonlinear activations is hard approximate intrinsic linear transformations between feature representations. In this paper, we investigate effect erasing ReLUs certain apply it various representative architectures following deterministic rules. It optimization improve generalization performance for CNN models. We find two key factors being essential improvement: 1) location where should be erased inside basic module; 2) proportion modules erase ReLU; show that last layer all in usually yields improved performance. In experiments, our approach successfully improves report results on SVHN, CIFAR-10/100, ImageNet. achieve competitive single-model CIFAR-100 16.53% error rate compared state-of-the-art.