Improving Generalization Performance by Switching from Adam to SGD.

作者： Richard Socher , Nitish Shirish Keskar

DOI:

关键词:

摘要: … and ascribe the poor generalization performance to training issues arising … the generalization performance of AMSGrad to be similar to that of Adam on problems where a generalization …

uni-trier.de 本地加速

arxiv-vanity.com 本地加速

arxiv.org 本地加速

harvard.edu 本地加速

arxiv.org PDF 下载加速

参考文章(3)

Yoshua Bengio, Kyunghyun Cho, Dzmitry Bahdanau, Neural Machine Translation by Jointly Learning to Align and Translate arXiv: Computation and Language. ,(2014)

Takuya Akiba, Keisuke Fukuda, Shuji Suzuki, Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes arXiv: Distributed, Parallel, and Cluster Computing. ,(2017)

Satyen Kale, Sashank J. Reddi, Sanjiv Kumar, On the Convergence of Adam and Beyond international conference on learning representations. ,(2018)

Improving Generalization Performance by Switching from Adam to SGD.

来源期刊

我的账户

Improving Generalization Performance by Switching from Adam to SGD.

来源期刊

相似文章 10

我的账户