Adaptive Hierarchical Hyper-gradient Descent

作者： Andrey Vasnev , Junbin Gao , Minh-Ngoc Tran , Renlong Jie

DOI:

关键词:

摘要: Adaptive learning rates can lead to faster convergence and better final performance for deep models. There are several widely known human-designed adap- tive optimizers such as Adam RMSProp, gradient based adaptive methods hyper-descent L4, meta approaches including learn. However, the issue of balancing adaptiveness over-parameterization is still a topic be addressed. In this study, we investigate different levels rate adaptation on framework hyper-gradient descent, further propose method that adaptively learns model parameters combin- ing adaptations. Meanwhile, show relationship between adding regularization over-parameterized building combi- nations rates. The experiments network architectures feed-forward networks, LeNet-5 ResNet-18/34 proposed multi-level approach outperform baseline in variety circumstances with statistical significance.

openreview.net 本地加速

arxiv.org 本地加速

springer.com 本地加速

openreview.net PDF 下载加速

arxiv.org PDF 下载加速

参考文章(26)

Kaiming He, Jian Sun, Convolutional neural networks at constrained time cost computer vision and pattern recognition. pp. 5353- 5360 ,(2015) , 10.1109/CVPR.2015.7299173

Daniel Svozil, Vladimír Kvasnicka, Jir̂í Pospichal, Introduction to multi-layer feed-forward neural networks Chemometrics and Intelligent Laboratory Systems. ,vol. 39, pp. 43- 62 ,(1997) , 10.1016/S0169-7439(97)00061-0

Yoshua Bengio, Rémi Bardenet, James S. Bergstra, Balázs Kégl, Algorithms for Hyper-Parameter Optimization neural information processing systems. ,vol. 24, pp. 2546- 2554 ,(2011)

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition Proceedings of the IEEE. ,vol. 86, pp. 2278- 2324 ,(1998) , 10.1109/5.726791

Elad Hazan, Yoram Singer, John Duchi, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Journal of Machine Learning Research. ,vol. 12, pp. 2121- 2159 ,(2011)

Terrence L. Fine, V. Nair, S. L. Lauritzen, M. Jordan, J. Lawless, Feedforward Neural Network Methodology ,(1999)

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90

Yuval Netzer, Andrew Y. Ng, Adam Coates, Alessandro Bissacco, Tao Wang, Bo Wu, Reading Digits in Natural Images with Unsupervised Feature Learning ,(2011)

Sebastian Ruder, An overview of gradient descent optimization algorithms arXiv: Learning. ,(2016)

10.

Atilim Gunes Baydin, Frank Wood, Robert Cornish, Mark Schmidt, David Martinez Rubio, Online Learning Rate Adaptation with Hypergradient Descent arXiv: Learning. ,(2017)

Adaptive Hierarchical Hyper-gradient Descent

来源期刊

我的账户

Adaptive Hierarchical Hyper-gradient Descent

来源期刊

相似文章 0

我的账户