Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

作者： Zheng Wang , Dominik Grewe , Michael F. P. O’boyle

关键词: Scheme (programming language) 、 Host (network) 、 Code (cryptography) 、 Multi-core processor 、 Computer science 、 Key (cryptography) 、 Parallel computing 、 Code generation 、 Benchmark (computing) 、 Compiler

摘要: General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature our scheme that it leverages existing transformations, especially improve on GPU architectures and uses automatic machine learning build predictive model determine if worthwhile running or multicore host. We applied entire NAS benchmark suite evaluated distinct systems. achieved average (up to) speedups 4.51× 4.20× (143× 67×) Core i7/NVIDIA GeForce GTX580 i7/AMD Radeon 7970 platforms, respectively, over sequential baseline. Our achieves, average, greater than 10× two state-of-the-art generators.

参考文章(53)

Zheng Wang, Daniel Powell, Björn Franke, Michael O’Boyle, Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code compiler construction. pp. 154- 173 ,(2014) , 10.1007/978-3-642-54807-9_9

Lars Karlsson, Blocked in-place transposition with application to storage format conversion Institutionen för datavetenskap, Umeå universitet. ,(2009)

Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, Wen-mei W. Hwu, CUDA-Lite: Reducing GPU Programming Complexity languages and compilers for parallel computing. pp. 1- 15 ,(2008) , 10.1007/978-3-540-89740-8_1

Yuan Wen, Zheng Wang, Michael F. P. O'Boyle, Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms ieee international conference on high performance computing, data, and analytics. pp. 1- 10 ,(2014) , 10.1109/HIPC.2014.7116910

Dominik Grewe, Zheng Wang, Michael F. P. O’Boyle, OpenCL Task Partitioning in the Presence of GPU Contention languages and compilers for parallel computing. pp. 87- 101 ,(2013) , 10.1007/978-3-319-09967-5_5

Muthu Manikandan Baskaran, Jj Ramanujam, P Sadayappan, None, Automatic C-to-CUDA code generation for affine programs compiler construction. pp. 244- 263 ,(2010) , 10.1007/978-3-642-11970-5_14

Dominik Grewe, Michael F. P. O’Boyle, A static task partitioning approach for heterogeneous systems using OpenCL compiler construction. pp. 286- 305 ,(2011) , 10.1007/978-3-642-19861-8_16

Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, David I. August, Dynamically managed data for CPU-GPU architectures symposium on code generation and optimization. pp. 165- 174 ,(2012) , 10.1145/2259016.2259038

D. Grewe, Zheng Wang, M. F. P. O'Boyle, Portable mapping of data parallel programs to OpenCL for heterogeneous systems symposium on code generation and optimization. pp. 1- 10 ,(2013) , 10.1109/CGO.2013.6494993

10.

Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle, Integrating profile-driven parallelism detection and machine-learning-based mapping ACM Transactions on Architecture and Code Optimization. ,vol. 11, pp. 1- 26 ,(2014) , 10.1145/2579561

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

来源期刊

我的账户

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

来源期刊

相似文章 10

我的账户