摘要: Stochastic noise on gradients is now a common feature in machine learning. It complicates the design of optimization algorithms, and its effect can be unintuitive: We show that some settings, particularly those low signal-to-noise ratio, it helpful to discard all but signs stochastic gradient elements. In fact, we argue three popular existing methods already approximate this very paradigm. devise novel algorithms explicitly follow sign estimates while appropriately accounting for their uncertainty. These favorably compare state art number benchmark problems.