作者: Apala Guha , Naveen Vedula , Arrvindh Shriraman
关键词:
摘要: Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small traces, limited either the overheads of dynamic profiling, statically available information [4], or side-exit branches [5]. In this work, we rethink what path sequences fuse construct long traces for spatial accelerators, while minimizing occurrence side exits which limit coverage. We introduce a novel technique that recasts learning program's execution patterns as natural-language-processing problem, CBOW (Continuous Bag Words). then use deep network learn relationships among paths. During compilation phase, uses sequence miner decide are likely occur. The predicts Deepframe online, an block comprising multi-path (each itself composed multiple blocks). demonstrate efficacy on hardware accelerators find following: i) can up 5x (max: 27x) longer offload regions compared prior approaches. ii) Surprisingly far-flung ILP (instruction-level parallelism) MLP (memory-level be mined from frames (5.5x increase 10.5x MLP). iii) offloaded accelerator have minimal (mis-speculation) achieve sufficient coverage improve overall application performance (up 9x improvement). will releasing open-source our end-to-end prototype based LLVM.