Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

作者: Zheng Wang , Dominik Grewe , Michael F. P. O’boyle

DOI: 10.1145/2677036

关键词: Scheme (programming language)Host (network)Code (cryptography)Multi-core processorComputer scienceKey (cryptography)Parallel computingCode generationBenchmark (computing)Compiler

摘要: General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature our scheme that it leverages existing transformations, especially improve on GPU architectures and uses automatic machine learning build predictive model determine if worthwhile running or multicore host. We applied entire NAS benchmark suite evaluated distinct systems. achieved average (up to) speedups 4.51× 4.20× (143× 67×) Core i7/NVIDIA GeForce GTX580 i7/AMD Radeon 7970 platforms, respectively, over sequential baseline. Our achieves, average, greater than 10× two state-of-the-art generators.

参考文章(53)
Zheng Wang, Daniel Powell, Björn Franke, Michael O’Boyle, Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code compiler construction. pp. 154- 173 ,(2014) , 10.1007/978-3-642-54807-9_9
Lars Karlsson, Blocked in-place transposition with application to storage format conversion Institutionen för datavetenskap, Umeå universitet. ,(2009)
Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, Wen-mei W. Hwu, CUDA-Lite: Reducing GPU Programming Complexity languages and compilers for parallel computing. pp. 1- 15 ,(2008) , 10.1007/978-3-540-89740-8_1
Yuan Wen, Zheng Wang, Michael F. P. O'Boyle, Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms ieee international conference on high performance computing, data, and analytics. pp. 1- 10 ,(2014) , 10.1109/HIPC.2014.7116910
Dominik Grewe, Zheng Wang, Michael F. P. O’Boyle, OpenCL Task Partitioning in the Presence of GPU Contention languages and compilers for parallel computing. pp. 87- 101 ,(2013) , 10.1007/978-3-319-09967-5_5
Muthu Manikandan Baskaran, Jj Ramanujam, P Sadayappan, None, Automatic C-to-CUDA code generation for affine programs compiler construction. pp. 244- 263 ,(2010) , 10.1007/978-3-642-11970-5_14
Dominik Grewe, Michael F. P. O’Boyle, A static task partitioning approach for heterogeneous systems using OpenCL compiler construction. pp. 286- 305 ,(2011) , 10.1007/978-3-642-19861-8_16
Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, David I. August, Dynamically managed data for CPU-GPU architectures symposium on code generation and optimization. pp. 165- 174 ,(2012) , 10.1145/2259016.2259038
D. Grewe, Zheng Wang, M. F. P. O'Boyle, Portable mapping of data parallel programs to OpenCL for heterogeneous systems symposium on code generation and optimization. pp. 1- 10 ,(2013) , 10.1109/CGO.2013.6494993
Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle, Integrating profile-driven parallelism detection and machine-learning-based mapping ACM Transactions on Architecture and Code Optimization. ,vol. 11, pp. 1- 26 ,(2014) , 10.1145/2579561