Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

作者: Yiping Kang , Johann Hauswald , Cao Gao , Austin Rovinski , Trevor Mudge

DOI: 10.1145/3037697.3037698

关键词:

摘要: The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent cloud over wireless network puts computational pressure on datacenter. However, resources mobile devices become more powerful energy efficient, questions arise whether this processing desirable moving forward, what are implications pushing some or all compute edge.In paper, we examine status quo investigate partitioning strategies that effectively leverage both cycles device achieve low latency, consumption, high datacenter throughput class applications. Our study uses 8 applications spanning computer vision, speech, natural language domains, employing state-of-the-art Deep Neural Networks (DNNs) core machine learning technique. We find given characteristics DNN algorithms, a fine-grained, layer-level strategy based variations each layer within has latency advantages approach.Using insight, design Neurosurgeon, lightweight scheduler automatically partition between datacenters at granularity neural layers. Neurosurgeon does not require per-application profiling. It adapts various architectures, hardware platforms, networks, server load levels, intelligently best energy. evaluate development platform show it improves end-to-end by 3.1X average up 40.7X, reduces consumption 59.5% 94.7%, 1.5X 6.7X.

参考文章(39)
Quan Chen, Hailong Yang, Jason Mars, Lingjia Tang, Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers architectural support for programming languages and operating systems. ,vol. 44, pp. 681- 696 ,(2016) , 10.1145/2872362.2872368
Yunqi Zhang, David Meisner, Jason Mars, Lingjia Tang, Treadmill: attributing the source of tail latency through precise load testing and statistical inference international symposium on computer architecture. ,vol. 44, pp. 456- 468 ,(2016) , 10.1145/3007787.3001186
Michael A. Laurenzano, Yunqi Zhang, Jiang Chen, Lingjia Tang, Jason Mars, PowerChop: identifying and managing non-critical units in hybrid processor architectures international symposium on computer architecture. ,vol. 44, pp. 140- 152 ,(2016) , 10.1145/3007787.3001152
Matthai Philipose, Alec Wolman, Haichen Shen, Sharad Agarwal, MCDNN: An Execution Framework for Deep Neural Networks on Resource-Constrained Devices MSR Technical Report. ,(2015)
Michael A. Laurenzano, Lingjia Tang, Jason Mars, Animesh Jain, Continuous shape shifting: enabling loop co-optimization via near-free dynamic code rewriting international symposium on microarchitecture. pp. 1- 12 ,(2016) , 10.5555/3195638.3195666
Ashkan Nikravesh, David R. Choffnes, Ethan Katz-Bassett, Z. Morley Mao, Matt Welsh, Mobile Network Performance from User Devices: A Longitudinal, Multidimensional Analysis passive and active network measurement. pp. 12- 22 ,(2014) , 10.1007/978-3-319-04918-2_2
Petr Motlicek, Georg Stemmer, Ondrej Glembek, Karel Vesely, Lukas Burget, Gilles Boulianne, Yanmin Qian, Mirko Hannemann, Nagendra Goel, Petr Schwarz, Arnab Ghoshal, Jan Silovsky, Daniel Povey, The Kaldi Speech Recognition Toolkit ieee automatic speech recognition and understanding workshop. ,(2011)
D. Anoushe Jamshidi, Z. Morley Mao, Scott Mahlke, Mark S. Gordon, Xu Chen, COMET: code offload by migrating execution transparently operating systems design and implementation. pp. 93- 106 ,(2012) , 10.5555/2387880.2387890
Evan Shelhamer, John Tran, Jonathan Cohen, Sharan Chetlur, Philippe Vandermersch, Cliff Woolley, Bryan Catanzaro, cuDNN: Efficient Primitives for Deep Learning arXiv: Neural and Evolutionary Computing. ,(2014)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)