FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

作者: Ramtin Pedarsani , Ali Jadbabaie , Aryan Mokhtari , Amirhossein Reisizadeh , Hamed Hassani

DOI:

关键词:

摘要: Federated learning is a distributed framework according to which model trained over set of devices, while keeping data localized. This faces several systems-oriented challenges include (i) communication bottleneck since large number devices upload their local updates parameter server, and (ii) scalability as the federated network consists millions devices. Due these systems well issues related statistical heterogeneity privacy concerns, designing provably efficient method significant importance yet it remains challenging. In this paper, we present FedPAQ, communication-efficient Learning with Periodic Averaging Quantization. FedPAQ relies on three key features: (1) periodic averaging where models are updated locally at only periodically averaged server; (2) partial device participation fraction participate in each round training; (3) quantized message-passing edge nodes quantize before uploading server. These features address communications learning. We also show that achieves near-optimal theoretical guarantees for strongly convex non-convex loss functions empirically demonstrate communication-computation tradeoff provided by our method.

参考文章(40)
Léon Bottou, Olivier Bousquet, The Tradeoffs of Large Scale Learning neural information processing systems. ,vol. 20, pp. 161- 168 ,(2007)
John C Duchi, Michael I Jordan, Martin J Wainwright, None, Privacy Aware Learning Journal of the ACM. ,vol. 61, pp. 38- ,(2014) , 10.1145/2666468
Frank Seide, Dong Yu, Jasha Droppo, Hao Fu, Gang Li, 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. conference of the international speech communication association. pp. 1058- 1062 ,(2014)
Jakub Konečný, Ananda Theertha Suresh, Dave Bacon, Felix X. Yu, Peter Richtarik, H. Brendan McMahan, Federated Learning: Strategies for Improving Communication Efficiency arXiv: Learning. ,(2016)
Viveck R. Cadambe, Pulkit Grover, Sanghamitra Dutta, Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products neural information processing systems. ,vol. 29, pp. 2100- 2108 ,(2016)
Virginia Smith, Simone Forte, Chenxin Ma, Martin Takáč, Michael I Jordan, Martin Jaggi, None, CoCoA: A General Framework for Communication-Efficient Distributed Optimization arXiv: Learning. ,(2016)
Yudong Chen, Lili Su, Jiaming Xu, Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent measurement and modeling of computer systems. ,vol. 1, pp. 96- 96 ,(2018) , 10.1145/3219617.3219655
Dan Alistarh, Milan Vojnovic, Jerry Z. Li, Ryota Tomioka, Demjan Grubic, QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding neural information processing systems. ,vol. 30, pp. 1709- 1720 ,(2017)
Peter Bartlett, Kannan Ramchandran, Yudong Chen, Dong Yin, Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates arXiv: Learning. ,(2018)
John Schulman, Alex Nichol, Reptile: a Scalable Metalearning Algorithm arXiv: Learning. ,(2018)