Accelerating Multi-Model Inference by Merging DNNs of Different Weights.

作者： Byung-Gon Chun , Yunseong Lee , Soojeong Kim , Gyeong-In Yu , Joo Seong Jeong

DOI:

关键词: Machine learning 、 Titan (supercomputer) 、 Inference 、 Computer science 、 Speedup 、 Transfer of learning 、 Server 、 Multi model inference 、 Artificial intelligence 、 Set (abstract data type)

摘要: Standardized DNN models that have been proved to perform well on machine learning tasks are widely used and often adopted as-is solve downstream tasks, forming the transfer paradigm. However, when serving multiple instances of such from a cluster GPU servers, existing techniques improve utilization as batching inapplicable because do not share weights due fine-tuning. We propose NetFuse, technique merging same architecture but different inputs. NetFuse is made possible by replacing operations with more general counterparts allow set be associated only certain Experiments ResNet-50, ResNeXt-50, BERT, XLNet show can speed up inference time 3.6x NVIDIA V100 GPU, 3.0x TITAN Xp 32 model instances, while using small additional amount memory.

uni-trier.de PDF 下载加速

参考文章(30)

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y

Yoshua Bengio, Hod Lipson, Jeff Clune, Jason Yosinski, How transferable are features in deep neural networks neural information processing systems. ,vol. 27, pp. 3320- 3328 ,(2014)

Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks neural information processing systems. ,vol. 25, pp. 1097- 1105 ,(2012)

Dhruv Batra, Michael Cogswell, David J. Crandall, Stefan Lee, Senthil Purushwalkam, Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks arXiv: Computer Vision and Pattern Recognition. ,(2015)

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna, Rethinking the Inception Architecture for Computer Vision computer vision and pattern recognition. pp. 2818- 2826 ,(2016) , 10.1109/CVPR.2016.308

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, Arvind Krishnamurthy, MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints international conference on mobile systems, applications, and services. pp. 123- 136 ,(2016) , 10.1145/2906388.2906396

Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He, Aggregated Residual Transformations for Deep Neural Networks 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5987- 5995 ,(2017) , 10.1109/CVPR.2017.634

Brian Chu, Vashisht Madhavan, Oscar Beijbom, Judy Hoffman, Trevor Darrell, Best Practices for Fine-Tuning Visual Classifiers to New Domains european conference on computer vision. pp. 435- 442 ,(2016) , 10.1007/978-3-319-49409-8_34

10.

Marco Andreetto, Tobias Weyand, Hartwig Adam, Menglong Zhu, Dmitry Kalenichenko, Bo Chen, Andrew G. Howard, Weijun Wang, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications arXiv: Computer Vision and Pattern Recognition. ,(2017)

Accelerating Multi-Model Inference by Merging DNNs of Different Weights.

来源期刊

我的账户

Accelerating Multi-Model Inference by Merging DNNs of Different Weights.

来源期刊

相似文章 0

我的账户