作者: Yiping Kang , Johann Hauswald , Cao Gao , Austin Rovinski , Trevor Mudge
关键词:
摘要: The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent cloud over wireless network puts computational pressure on datacenter. However, resources mobile devices become more powerful energy efficient, questions arise whether this processing desirable moving forward, what are implications pushing some or all compute edge.In paper, we examine status quo investigate partitioning strategies that effectively leverage both cycles device achieve low latency, consumption, high datacenter throughput class applications. Our study uses 8 applications spanning computer vision, speech, natural language domains, employing state-of-the-art Deep Neural Networks (DNNs) core machine learning technique. We find given characteristics DNN algorithms, a fine-grained, layer-level strategy based variations each layer within has latency advantages approach.Using insight, design Neurosurgeon, lightweight scheduler automatically partition between datacenters at granularity neural layers. Neurosurgeon does not require per-application profiling. It adapts various architectures, hardware platforms, networks, server load levels, intelligently best energy. evaluate development platform show it improves end-to-end by 3.1X average up 40.7X, reduces consumption 59.5% 94.7%, 1.5X 6.7X.