作者: Valentin Radu , Jack Turner , José Cano , Elliot J Crowley , Michael O’Boyle
DOI:
关键词:
摘要: There is a growing demand in ubiquitous computing for performing even more complex detection tasks at the edge of the Internet, on resource constrained devices, due to privacy concerns related to transferring user sensitive data, and to operating in locations without network connection. At the same time, deep learning has emerged as the dominant solution for improved detection accuracy in several areas of interest, computer vision, speech, translations and context detection [Radu et al., 2018]. Although outstandingly accurate, deep neural networks are known for their high computation demand, so using these to perform detections running on resource constrained devices is an open challenge.In the EU project, Bonseyes, we are developing new solutions to facilitate portability of deep neural networks to resource constrained mobile devices. We define the Deep Learning Inference Stack (DLIS) as the set of techniques that work together to produce deep neural network based inferences, spread over the following layers:(1) Neural Network Model;(2) Machine Learning Compression Technique;(3) Data Format;(4) Computation and Workload Parallelization;(5) Hardware. For each of these layers we select specific candidates and evaluate their impact on performance in combinations across layers. Although the biggest gains are generally expected to come from layers (1) and (5), by designing smaller more refined neural networks and using specialized inference hardware respectively, our investigation is focused primarily on layers (2)-(4), which we believe still hold potential for further optimizations. Promising candidates are selected at each …