Improving Generalization Performance by Switching from Adam to SGD.

作者: Richard Socher , Nitish Shirish Keskar

DOI:

关键词:

摘要: … and ascribe the poor generalization performance to training issues arising … the generalization performance of AMSGrad to be similar to that of Adam on problems where a generalization …

参考文章(3)
Yoshua Bengio, Kyunghyun Cho, Dzmitry Bahdanau, Neural Machine Translation by Jointly Learning to Align and Translate arXiv: Computation and Language. ,(2014)
Takuya Akiba, Keisuke Fukuda, Shuji Suzuki, Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes arXiv: Distributed, Parallel, and Cluster Computing. ,(2017)
Satyen Kale, Sashank J. Reddi, Sanjiv Kumar, On the Convergence of Adam and Beyond international conference on learning representations. ,(2018)