作者: Andrey Vasnev , Junbin Gao , Minh-Ngoc Tran , Renlong Jie
DOI:
关键词:
摘要: Adaptive learning rates can lead to faster convergence and better final performance for deep models. There are several widely known human-designed adap- tive optimizers such as Adam RMSProp, gradient based adaptive methods hyper-descent L4, meta approaches including learn. However, the issue of balancing adaptiveness over-parameterization is still a topic be addressed. In this study, we investigate different levels rate adaptation on framework hyper-gradient descent, further propose method that adaptively learns model parameters combin- ing adaptations. Meanwhile, show relationship between adding regularization over-parameterized building combi- nations rates. The experiments network architectures feed-forward networks, LeNet-5 ResNet-18/34 proposed multi-level approach outperform baseline in variety circumstances with statistical significance.