Deepframe: A Profile-Driven Compiler for Spatial Hardware Accelerators

作者: Apala Guha , Naveen Vedula , Arrvindh Shriraman

DOI: 10.1109/PACT.2019.00014

关键词:

摘要: Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small traces, limited either the overheads of dynamic profiling, statically available information [4], or side-exit branches [5]. In this work, we rethink what path sequences fuse construct long traces for spatial accelerators, while minimizing occurrence side exits which limit coverage. We introduce a novel technique that recasts learning program's execution patterns as natural-language-processing problem, CBOW (Continuous Bag Words). then use deep network learn relationships among paths. During compilation phase, uses sequence miner decide are likely occur. The predicts Deepframe online, an block comprising multi-path (each itself composed multiple blocks). demonstrate efficacy on hardware accelerators find following: i) can up 5x (max: 27x) longer offload regions compared prior approaches. ii) Surprisingly far-flung ILP (instruction-level parallelism) MLP (memory-level be mined from frames (5.5x increase 10.5x MLP). iii) offloaded accelerator have minimal (mis-speculation) achieve sufficient coverage improve overall application performance (up 9x improvement). will releasing open-source our end-to-end prototype based LLVM.

参考文章(42)
Rodrigo Sol, Christophe Guillon, Fernando Magno Quintão Pereira, Mariza A. S. Bigonha, Dynamic elimination of overflow tests in a trace compiler compiler construction. pp. 2- 21 ,(2011) , 10.1007/978-3-642-19861-8_2
Sanjay J. Patel, Steven S. Lumetta, rePLay: A Hardware Framework for Dynamic Program Optimization Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. ,(1999)
Thomas M. Conte, Kishore N. Menezes, Patrick M. Mills, Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates international symposium on computer architecture. ,vol. 23, pp. 333- 344 ,(1995) , 10.1145/223982.224444
Thomas Ball, James R. Larus, Branch prediction for free Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation - PLDI '93. ,vol. 28, pp. 300- 313 ,(1993) , 10.1145/155090.155119
Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, David August, Bundled execution of recurring traces for energy-efficient general purpose processing Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11. pp. 12- 23 ,(2011) , 10.1145/2155620.2155623
Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, Jason Mars, Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers international symposium on microarchitecture. pp. 558- 570 ,(2014) , 10.1109/MICRO.2014.21
Matthew Arnold, Stephen Fink, Vivek Sarkar, Peter F. Sweeney, A comparative study of static and profile-based heuristics for inlining Sigplan Notices. ,vol. 35, pp. 52- 64 ,(2000) , 10.1145/351397.351416
Daniel A. Jiménez, Calvin Lin, Neural methods for dynamic branch prediction ACM Transactions on Computer Systems. ,vol. 20, pp. 369- 397 ,(2002) , 10.1145/571637.571639
Karthikeyan Sankaralingam, Tony Nowatzki, Vinay Gangadhar, Exploring the potential of heterogeneous von neumann/dataflow execution models international symposium on computer architecture. ,vol. 43, pp. 298- 310 ,(2015) , 10.1145/2749469.2750380
Kevin M. Crozier, Qudus B. Olaniran, Wen-mei W. Hwu, Daniel A. Connors, John W. Sias, Scott A. Mahlke, David I. August, Patrick R. Eaton, Ben-Chung Cheng, Integrated predicated and speculative execution in the IMPACT EPIC architecture international symposium on computer architecture. ,vol. 26, pp. 227- 237 ,(1998) , 10.1145/279358.279391