Follow the Signs for Robust Stochastic Optimization.

作者: Philipp Hennig , Lukas Balles

DOI:

关键词:

摘要: Stochastic noise on gradients is now a common feature in machine learning. It complicates the design of optimization algorithms, and its effect can be unintuitive: We show that some settings, particularly those low signal-to-noise ratio, it helpful to discard all but signs stochastic gradient elements. In fact, we argue three popular existing methods already approximate this very paradigm. devise novel algorithms explicitly follow sign estimates while appropriately accounting for their uncertainty. These favorably compare state art number benchmark problems.

参考文章(25)
Han Xiao, Kashif Rasul, Roland Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms arXiv: Learning. ,(2017)
Riccardo Zecchina, Riccardo Zecchina, Stefano Soatto, Carlo Baldassi, Carlo Baldassi, Anna Choromanska, Jennifer Chayes, Christian Borgs, Yann LeCun, Yann LeCun, Pratik Chaudhari, Pratik Chaudhari, Levent Sagun, Entropy-SGD: Biasing Gradient Descent Into Wide Valleys arXiv: Learning. ,(2016)
Philipp Hennig, Maren Mahsereci, Probabilistic Line Searches for Stochastic Optimization arXiv: Learning. ,(2015)
Francis Bach, Aaron Defazio, Simon Lacoste-Julien, SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives arXiv: Learning. ,(2014)
Aaditya Ramdas, Aarti Singh, Algorithmic Connections between Active Learning and Stochastic Convex Optimization algorithmic learning theory. pp. 339- 353 ,(2013) , 10.1007/978-3-642-40935-6_24
Hamed Karimi, Julie Nutini, Mark Schmidt, Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition european conference on machine learning. pp. 795- 811 ,(2016) , 10.1007/978-3-319-46128-1_50
Diederik P. Kingma, Jimmy Lei Ba, Adam: A Method for Stochastic Optimization international conference on learning representations. ,(2015)
Alex Krizhevsky, Geoffrey Hinton, Learning Multiple Layers of Features from Tiny Images ,(2009)
Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method arXiv: Learning. ,(2012)
Yann Lecun, S. Becker, Improving the convergence of back-propagation learning with second-order methods Morgan Kaufmann. pp. 29- 37 ,(1989)