作者: Byung-Gon Chun , Eunji Jeong , Soojeong Kim , Hyeonmin Ha , Sanha Lee
DOI:
关键词: Artificial neural network 、 Throughput (business) 、 Computer science 、 Speedup 、 Artificial intelligence 、 Contextual image classification 、 Parallax 、 Deep learning 、 Computer engineering
摘要: The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in learning (DL). DL frameworks, such as TensorFlow, MXNet, Caffe2, emerged to assist researchers train their a distributed manner. Although current frameworks scale well image classification models, there remain opportunities scalable on natural language processing (NLP) models. We found that show relatively low scalability NLP due the lack consideration difference sparsity model parameters. In this paper, we propose Parallax, framework optimizes data parallel by utilizing Parallax introduces hybrid approach combines Parameter Server AllReduce architectures optimize amount transfer according sparsity. Experiments built atop TensorFlow achieves throughput both dense sparse while requiring little effort from its users. up 2.8x, 6.02x speedup than Horovod with 48 GPUs, respectively. speed is equal 1.53x faster TensorFlow.