Deepframe: A Profile-Driven Compiler for Spatial Hardware Accelerators

作者： Apala Guha , Naveen Vedula , Arrvindh Shriraman

DOI: 10.1109/PACT.2019.00014

关键词:

摘要: Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small traces, limited either the overheads of dynamic profiling, statically available information [4], or side-exit branches [5]. In this work, we rethink what path sequences fuse construct long traces for spatial accelerators, while minimizing occurrence side exits which limit coverage. We introduce a novel technique that recasts learning program's execution patterns as natural-language-processing problem, CBOW (Continuous Bag Words). then use deep network learn relationships among paths. During compilation phase, uses sequence miner decide are likely occur. The predicts Deepframe online, an block comprising multi-path (each itself composed multiple blocks). demonstrate efficacy on hardware accelerators find following: i) can up 5x (max: 27x) longer offload regions compared prior approaches. ii) Surprisingly far-flung ILP (instruction-level parallelism) MLP (memory-level be mined from frames (5.5x increase 10.5x MLP). iii) offloaded accelerator have minimal (mis-speculation) achieve sufficient coverage improve overall application performance (up 9x improvement). will releasing open-source our end-to-end prototype based LLVM.

uni-trier.de 本地加速

sci-hub.se PDF 下载加速

参考文章(42)

Rodrigo Sol, Christophe Guillon, Fernando Magno Quintão Pereira, Mariza A. S. Bigonha, Dynamic elimination of overflow tests in a trace compiler compiler construction. pp. 2- 21 ,(2011) , 10.1007/978-3-642-19861-8_2

Sanjay J. Patel, Steven S. Lumetta, rePLay: A Hardware Framework for Dynamic Program Optimization Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. ,(1999)

Thomas M. Conte, Kishore N. Menezes, Patrick M. Mills, Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates international symposium on computer architecture. ,vol. 23, pp. 333- 344 ,(1995) , 10.1145/223982.224444

Thomas Ball, James R. Larus, Branch prediction for free Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation - PLDI '93. ,vol. 28, pp. 300- 313 ,(1993) , 10.1145/155090.155119

Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, David August, Bundled execution of recurring traces for energy-efficient general purpose processing Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11. pp. 12- 23 ,(2011) , 10.1145/2155620.2155623

Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, Jason Mars, Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers international symposium on microarchitecture. pp. 558- 570 ,(2014) , 10.1109/MICRO.2014.21

Matthew Arnold, Stephen Fink, Vivek Sarkar, Peter F. Sweeney, A comparative study of static and profile-based heuristics for inlining Sigplan Notices. ,vol. 35, pp. 52- 64 ,(2000) , 10.1145/351397.351416

Daniel A. Jiménez, Calvin Lin, Neural methods for dynamic branch prediction ACM Transactions on Computer Systems. ,vol. 20, pp. 369- 397 ,(2002) , 10.1145/571637.571639

Karthikeyan Sankaralingam, Tony Nowatzki, Vinay Gangadhar, Exploring the potential of heterogeneous von neumann/dataflow execution models international symposium on computer architecture. ,vol. 43, pp. 298- 310 ,(2015) , 10.1145/2749469.2750380

10.

Kevin M. Crozier, Qudus B. Olaniran, Wen-mei W. Hwu, Daniel A. Connors, John W. Sias, Scott A. Mahlke, David I. August, Patrick R. Eaton, Ben-Chung Cheng, Integrated predicated and speculative execution in the IMPACT EPIC architecture international symposium on computer architecture. ,vol. 26, pp. 227- 237 ,(1998) , 10.1145/279358.279391

Deepframe: A Profile-Driven Compiler for Spatial Hardware Accelerators

来源期刊

我的账户

Deepframe: A Profile-Driven Compiler for Spatial Hardware Accelerators

来源期刊

相似文章 3

GPA: A GPU Performance Advisor Based on Instruction Sampling

GPA: A GPU Performance Advisor Based on Instruction Sampling

Program transformations as the base for optimizing parallelizing compilers

我的账户