Adaptive Hierarchical Hyper-gradient Descent

作者: Andrey Vasnev , Junbin Gao , Minh-Ngoc Tran , Renlong Jie

DOI:

关键词:

摘要: Adaptive learning rates can lead to faster convergence and better final performance for deep models. There are several widely known human-designed adap- tive optimizers such as Adam RMSProp, gradient based adaptive methods hyper-descent L4, meta approaches including learn. However, the issue of balancing adaptiveness over-parameterization is still a topic be addressed. In this study, we investigate different levels rate adaptation on framework hyper-gradient descent, further propose method that adaptively learns model parameters combin- ing adaptations. Meanwhile, show relationship between adding regularization over-parameterized building combi- nations rates. The experiments network architectures feed-forward networks, LeNet-5 ResNet-18/34 proposed multi-level approach outperform baseline in variety circumstances with statistical significance.

参考文章(26)
Kaiming He, Jian Sun, Convolutional neural networks at constrained time cost computer vision and pattern recognition. pp. 5353- 5360 ,(2015) , 10.1109/CVPR.2015.7299173
Daniel Svozil, Vladimír Kvasnicka, Jir̂í Pospichal, Introduction to multi-layer feed-forward neural networks Chemometrics and Intelligent Laboratory Systems. ,vol. 39, pp. 43- 62 ,(1997) , 10.1016/S0169-7439(97)00061-0
Yoshua Bengio, Rémi Bardenet, James S. Bergstra, Balázs Kégl, Algorithms for Hyper-Parameter Optimization neural information processing systems. ,vol. 24, pp. 2546- 2554 ,(2011)
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition Proceedings of the IEEE. ,vol. 86, pp. 2278- 2324 ,(1998) , 10.1109/5.726791
Elad Hazan, Yoram Singer, John Duchi, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Journal of Machine Learning Research. ,vol. 12, pp. 2121- 2159 ,(2011)
Terrence L. Fine, V. Nair, S. L. Lauritzen, M. Jordan, J. Lawless, Feedforward Neural Network Methodology ,(1999)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90
Yuval Netzer, Andrew Y. Ng, Adam Coates, Alessandro Bissacco, Tao Wang, Bo Wu, Reading Digits in Natural Images with Unsupervised Feature Learning ,(2011)
Sebastian Ruder, An overview of gradient descent optimization algorithms arXiv: Learning. ,(2016)
Atilim Gunes Baydin, Frank Wood, Robert Cornish, Mark Schmidt, David Martinez Rubio, Online Learning Rate Adaptation with Hypergradient Descent arXiv: Learning. ,(2017)