Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

作者: Zheng Wang , Daniel Powell , Björn Franke , Michael O’Boyle

DOI: 10.1007/978-3-642-54807-9_9

关键词:

摘要: General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy exploit for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism generates parallel OpenCl code from programs. We fact that profiling provides us with loop candidates highly likely be genuinely parallel, cannot statically proven so. For efficient Gpu those probably candidates, propose novel software speculation scheme, which ensures correctness unlikely, yet possible case dynamically detected violations. Our scheme operates in place supports speculative read write operations. demonstrate effectiveness approach detecting exploiting using codes Nas benchmark suite. achieve an average speedup 3.2x, up 99x, over baseline. On average, is 1.42 times faster than state-of-the-art schemes corresponds 99% performance level manual implementation developed by independent expert programmers.

参考文章(31)
Lawrence Rauchwerger, Speculative Parallelization of Loops. parallel computing. pp. 1901- 1912 ,(2011)
Varun Mishra, Sanjeev K. Aggarwal, Partool: a feedback-directed parallelizer APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies. pp. 157- 171 ,(2011) , 10.1007/978-3-642-24151-2_12
Peng Wu, Arun Kejariwal, Călin Caşcaval, Compiler-Driven Dependence Profiling to Guide Program Parallelization languages and compilers for parallel computing. pp. 232- 248 ,(2008) , 10.1007/978-3-540-89740-8_16
Cosmin E. Oancea, Alan Mycroft, Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS) languages and compilers for parallel computing. pp. 156- 171 ,(2008) , 10.1007/978-3-540-89740-8_11
Alain Ketterlin, Philippe Clauss, Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization international symposium on microarchitecture. pp. 437- 448 ,(2012) , 10.1109/MICRO.2012.47
Rajeshwar Vanka, James Tuck, Efficient and accurate data dependence profiling using software signatures symposium on code generation and optimization. pp. 186- 195 ,(2012) , 10.1145/2259016.2259041
D. Grewe, Zheng Wang, M. F. P. O'Boyle, Portable mapping of data parallel programs to OpenCL for heterogeneous systems symposium on code generation and optimization. pp. 1- 10 ,(2013) , 10.1109/CGO.2013.6494993
Cosmin E. Oancea, Alan Mycroft, Tim Harris, A lightweight in-place implementation for software thread-level speculation acm symposium on parallel algorithms and architectures. pp. 223- 232 ,(2009) , 10.1145/1583991.1584050
Sangmin Seo, Gangwon Jo, Jaejin Lee, Performance characterization of the NAS Parallel Benchmarks in OpenCL ieee international symposium on workload characterization. pp. 137- 148 ,(2011) , 10.1109/IISWC.2011.6114174
Hongtao Yu, Zhiyuan Li, Fast loop-level data dependence profiling Proceedings of the 26th ACM international conference on Supercomputing - ICS '12. pp. 37- 46 ,(2012) , 10.1145/2304576.2304584