Distributed deep learning on edge-devices: Feasibility via adaptive compression

作者: Corentin Hardy , Erwan Le Merrer , Bruno Sericola

DOI: 10.1109/NCA.2017.8171350

关键词: Stochastic gradient descentEmbeddingMNIST databaseActive learning (machine learning)Edge deviceReal-time computingDeep learningArtificial intelligenceServerComputer scienceOnline machine learningAsynchronous communication

摘要: A large portion of data mining and analytic services use modern machine learning techniques, such as deep learning. The state-of-the-art results by come at the price an intensive computing resources. leading frameworks (e.g., TensorFlow) are executed on GPUs or high-end servers in datacenters. On other end, there is a proliferation personal devices with possibly free CPU cycles; this can enable to run users' homes, embedding operations. In paper, we ask following question: Is distributed computation WAN connected feasible, spite traffic caused tasks? We show that setup rises some important challenges, most notably ingress hosting up-to-date model have sustain. order reduce stress, propose AdaComp, novel algorithm for compressing worker updates server. Applicable stochastic gradient descent based approaches, it combines efficient selection rate modulation. then experiment measure impact compression, device heterogeneity reliability accuracy learned models, emulator platform embeds TensorFlow into Linux containers. report reduction total amount sent workers server two magnitude 191-fold convolutional network MNIST dataset), when compared standard asynchronous descent, while preserving accuracy.

参考文章(26)
Karthik Kalyanaraman, Yutaka Suzue, Trishul Chilimbi, Johnson Apacible, Project Adam: building an efficient and scalable deep learning training system operating systems design and implementation. pp. 571- 582 ,(2014) , 10.5555/2685048.2685094
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors Nature. ,vol. 323, pp. 696- 699 ,(1988) , 10.1038/323533A0
Pierre Priouret, Michel Métivier, Albert Benveniste, Adaptive Algorithms and Stochastic Approximations ,(1990)
Reza Shokri, Vitaly Shmatikov, Privacy-Preserving Deep Learning computer and communications security. pp. 1310- 1321 ,(2015) , 10.1145/2810103.2813687
Vytautas Valancius, Nikolaos Laoutaris, Laurent Massoulié, Christophe Diot, Pablo Rodriguez, Greening the internet with nano data centers Proceedings of the 5th international conference on Emerging networking experiments and technologies - CoNEXT '09. pp. 37- 48 ,(2009) , 10.1145/1658939.1658944
Jakub Konečný, Daniel Ramage, H. Brendan McMahan, Federated Optimization: Distributed Optimization Beyond the Datacenter arXiv: Learning. ,(2015)
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y
Benjamin Recht, Feng Niu, Christopher Re, Stephen Wright, Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent neural information processing systems. ,vol. 24, pp. 693- 701 ,(2011)
Elad Hazan, Yoram Singer, John Duchi, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Journal of Machine Learning Research. ,vol. 12, pp. 2121- 2159 ,(2011)
Ilya Sutskever, Tomas Mikolov, Greg S Corrado, Kai Chen, Jeff Dean, Distributed Representations of Words and Phrases and their Compositionality neural information processing systems. ,vol. 26, pp. 3111- 3119 ,(2013)