Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

作者: Byung-Gon Chun , Eunji Jeong , Soojeong Kim , Hyeonmin Ha , Sanha Lee

DOI:

关键词: Artificial neural networkThroughput (business)Computer scienceSpeedupArtificial intelligenceContextual image classificationParallaxDeep learningComputer engineering

摘要: The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in learning (DL). DL frameworks, such as TensorFlow, MXNet, Caffe2, emerged to assist researchers train their a distributed manner. Although current frameworks scale well image classification models, there remain opportunities scalable on natural language processing (NLP) models. We found that show relatively low scalability NLP due the lack consideration difference sparsity model parameters. In this paper, we propose Parallax, framework optimizes data parallel by utilizing Parallax introduces hybrid approach combines Parameter Server AllReduce architectures optimize amount transfer according sparsity. Experiments built atop TensorFlow achieves throughput both dense sparse while requiring little effort from its users. up 2.8x, 6.02x speedup than Horovod with 48 GPUs, respectively. speed is equal 1.53x faster TensorFlow.

参考文章(37)
Karthik Kalyanaraman, Yutaka Suzue, Trishul Chilimbi, Johnson Apacible, Project Adam: building an efficient and scalable deep learning training system operating systems design and implementation. pp. 571- 582 ,(2014) , 10.5555/2685048.2685094
Jesper Larsson Träff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev Thakur, William Gropp, A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems Recent Advances in Parallel Virtual Machine and Message Passing Interface. ,vol. 5205, pp. 84- 93 ,(2008) , 10.1007/978-3-540-87475-1_16
Yoshua Bengio, Yoshua Bengio, Yoshua Bengio, Jan Chorowski, Kyunghyun Cho, Dzmitry Bahdanau, End-to-end continuous speech recognition using attention-based recurrent nn: First results arXiv: Neural and Evolutionary Computing. ,(2014)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
A.R. Mamidala, Jiuxing Liu, D.K. Panda, Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms international conference on cluster computing. pp. 135- 144 ,(2004) , 10.1109/CLUSTR.2004.1392611
Vanja Josifovski, Alexander J. Smola, Bor-Yiing Su, David G. Andersen, Amr Ahmed, James Long, Eugene J. Shekita, Jun Woo Park, Mu Li, Scaling distributed machine learning with the parameter server operating systems design and implementation. pp. 583- 598 ,(2014) , 10.5555/2685048.2685095
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y
Pitch Patarasuk, Xin Yuan, Bandwidth Efficient All-reduce Operation on Tree Topologies international parallel and distributed processing symposium. pp. 1- 8 ,(2007) , 10.1109/IPDPS.2007.370405
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, Yoshua Bengio, Theano: A CPU and GPU Math Compiler in Python Proceedings of the 9th Python in Science Conference. pp. 18- 24 ,(2010) , 10.25080/MAJORA-92BF1922-003