Accelerating Multi-Model Inference by Merging DNNs of Different Weights.

作者: Byung-Gon Chun , Yunseong Lee , Soojeong Kim , Gyeong-In Yu , Joo Seong Jeong

DOI:

关键词: Machine learningTitan (supercomputer)InferenceComputer scienceSpeedupTransfer of learningServerMulti model inferenceArtificial intelligenceSet (abstract data type)

摘要: Standardized DNN models that have been proved to perform well on machine learning tasks are widely used and often adopted as-is solve downstream tasks, forming the transfer paradigm. However, when serving multiple instances of such from a cluster GPU servers, existing techniques improve utilization as batching inapplicable because do not share weights due fine-tuning. We propose NetFuse, technique merging same architecture but different inputs. NetFuse is made possible by replacing operations with more general counterparts allow set be associated only certain Experiments ResNet-50, ResNeXt-50, BERT, XLNet show can speed up inference time 3.6x NVIDIA V100 GPU, 3.0x TITAN Xp 32 model instances, while using small additional amount memory.

参考文章(30)
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y
Yoshua Bengio, Hod Lipson, Jeff Clune, Jason Yosinski, How transferable are features in deep neural networks neural information processing systems. ,vol. 27, pp. 3320- 3328 ,(2014)
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks neural information processing systems. ,vol. 25, pp. 1097- 1105 ,(2012)
Dhruv Batra, Michael Cogswell, David J. Crandall, Stefan Lee, Senthil Purushwalkam, Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks arXiv: Computer Vision and Pattern Recognition. ,(2015)
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna, Rethinking the Inception Architecture for Computer Vision computer vision and pattern recognition. pp. 2818- 2826 ,(2016) , 10.1109/CVPR.2016.308
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, Arvind Krishnamurthy, MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints international conference on mobile systems, applications, and services. pp. 123- 136 ,(2016) , 10.1145/2906388.2906396
Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He, Aggregated Residual Transformations for Deep Neural Networks 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5987- 5995 ,(2017) , 10.1109/CVPR.2017.634
Brian Chu, Vashisht Madhavan, Oscar Beijbom, Judy Hoffman, Trevor Darrell, Best Practices for Fine-Tuning Visual Classifiers to New Domains european conference on computer vision. pp. 435- 442 ,(2016) , 10.1007/978-3-319-49409-8_34
Marco Andreetto, Tobias Weyand, Hartwig Adam, Menglong Zhu, Dmitry Kalenichenko, Bo Chen, Andrew G. Howard, Weijun Wang, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications arXiv: Computer Vision and Pattern Recognition. ,(2017)