Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

作者： Byung-Gon Chun , Eunji Jeong , Soojeong Kim , Hyeonmin Ha , Sanha Lee

DOI:

关键词: Artificial neural network 、 Throughput (business) 、 Computer science 、 Speedup 、 Artificial intelligence 、 Contextual image classification 、 Parallax 、 Deep learning 、 Computer engineering

摘要: The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in learning (DL). DL frameworks, such as TensorFlow, MXNet, Caffe2, emerged to assist researchers train their a distributed manner. Although current frameworks scale well image classification models, there remain opportunities scalable on natural language processing (NLP) models. We found that show relatively low scalability NLP due the lack consideration difference sparsity model parameters. In this paper, we propose Parallax, framework optimizes data parallel by utilizing Parallax introduces hybrid approach combines Parameter Server AllReduce architectures optimize amount transfer according sparsity. Experiments built atop TensorFlow achieves throughput both dense sparse while requiring little effort from its users. up 2.8x, 6.02x speedup than Horovod with 48 GPUs, respectively. speed is equal 1.53x faster TensorFlow.

参考文章(37)

Karthik Kalyanaraman, Yutaka Suzue, Trishul Chilimbi, Johnson Apacible, Project Adam: building an efficient and scalable deep learning training system operating systems design and implementation. pp. 571- 582 ,(2014) , 10.5555/2685048.2685094

Y. Saad, Iterative Methods for Sparse Linear Systems ,(1996)

Jesper Larsson Träff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev Thakur, William Gropp, A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems Recent Advances in Parallel Virtual Machine and Message Passing Interface. ,vol. 5205, pp. 84- 93 ,(2008) , 10.1007/978-3-540-87475-1_16

Yoshua Bengio, Yoshua Bengio, Yoshua Bengio, Jan Chorowski, Kyunghyun Cho, Dzmitry Bahdanau, End-to-end continuous speech recognition using attention-based recurrent nn: First results arXiv: Neural and Evolutionary Computing. ,(2014)

Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)

A.R. Mamidala, Jiuxing Liu, D.K. Panda, Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms international conference on cluster computing. pp. 135- 144 ,(2004) , 10.1109/CLUSTR.2004.1392611

Vanja Josifovski, Alexander J. Smola, Bor-Yiing Su, David G. Andersen, Amr Ahmed, James Long, Eugene J. Shekita, Jun Woo Park, Mu Li, Scaling distributed machine learning with the parameter server operating systems design and implementation. pp. 583- 598 ,(2014) , 10.5555/2685048.2685095

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y

Pitch Patarasuk, Xin Yuan, Bandwidth Efficient All-reduce Operation on Tree Topologies international parallel and distributed processing symposium. pp. 1- 8 ,(2007) , 10.1109/IPDPS.2007.370405

10.

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, Yoshua Bengio, Theano: A CPU and GPU Math Compiler in Python Proceedings of the 9th Python in Science Conference. pp. 18- 24 ,(2010) , 10.25080/MAJORA-92BF1922-003

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

来源期刊

我的账户

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

来源期刊

相似文章 0

我的账户