Tighter Theory for Local SGD on Identical and Heterogeneous Data.

作者: Peter Richtárik , Konstantin Mishchenko , Ahmed Khaled

DOI:

关键词:

摘要: We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical heterogeneous. In both cases, we improve existing theory values optimal stepsize number iterations. Our bounds are based notion variance that is specific to SGD methods with different data. The tightness our results guaranteed by recovering known statements when plug $H=1$, where $H$ steps. empirical evidence further validates severe impact heterogeneity performance SGD.

参考文章(37)
Peter Richtárik, Yura Malitsky, Dmitry Kovalev, Konstantin Mishchenko, Egor Shulgin, Revisiting Stochastic Extragradient international conference on artificial intelligence and statistics. pp. 4573- 4582 ,(2019)
Guojing Cong, Fan Zhou, On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization international joint conference on artificial intelligence. pp. 3219- 3227 ,(2018) , 10.24963/IJCAI.2018/447
Lisandro D. Dalcin, Rodrigo R. Paz, Pablo A. Kler, Alejandro Cosimo, Parallel distributed computing using Python Advances in Water Resources. ,vol. 34, pp. 1124- 1139 ,(2011) , 10.1016/J.ADVWATRES.2011.04.013
L. O. Mangasarian, Parallel Gradient Distribution in Unconstrained Optimization Siam Journal on Control and Optimization. ,vol. 33, pp. 1916- 1925 ,(1995) , 10.1137/S0363012993250220
Keith Hall, Ryan McDonald, Gideon Mann, Distributed Training Strategies for the Structured Perceptron north american chapter of the association for computational linguistics. pp. 456- 464 ,(2010)
Chih-Chung Chang, Chih-Jen Lin, LIBSVM ACM Transactions on Intelligent Systems and Technology. ,vol. 2, pp. 1- 27 ,(2011) , 10.1145/1961189.1961199
Yuchen Zhang, John Duchi, Michael I Jordan, Martin J Wainwright, None, Information-theoretic lower bounds for distributed statistical estimation with communication constraints neural information processing systems. ,vol. 26, pp. 2328- 2336 ,(2013)
Jakub Konečný, Ananda Theertha Suresh, Dave Bacon, Felix X. Yu, Peter Richtarik, H. Brendan McMahan, Federated Learning: Strategies for Improving Communication Efficiency arXiv: Learning. ,(2016)