Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

作者： Dafei Huang , Mei Wen , Changqing Xun , Dong Chen , Xing Cai

关键词: Many core 、 Parallel computing 、 Multi-core processor 、 Coprocessor 、 Central processing unit 、 Software portability 、 Thread (computing) 、 Locality 、 Computer science

摘要: When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in code are usually inherited without analysis, which may give side-effects CPU performance. executing local-memory arrays no longer match well with hardware associated synchronizations costly. To solve this dilemma, we actively analyze memory access patterns by using array-access descriptors derived from kernels, can be adapted for CPUs removing all unwanted together obsolete barrier statements. Experiments show that automated transformation satisfactorily improve kernel performances Sandy Bridge Intel’s Many-Integrated-Core coprocessor.

springer.com PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(10)

John A. Stratton, Sam S. Stone, Wen-mei W. Hwu, MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs languages and compilers for parallel computing. pp. 16- 30 ,(2008) , 10.1007/978-3-540-89740-8_2

Sangmin Seo, Gangwon Jo, Jaejin Lee, Jun Lee, Automatic OpenCL work-group size selection for multicore CPUs international conference on parallel architectures and compilation techniques. pp. 387- 398 ,(2013) , 10.5555/2523721.2523772

Jayanth Gummaraju, Laurent Morichetti, Michael Houston, Ben Sander, Benedict R. Gaster, Bixia Zheng, Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors international conference on parallel architectures and compilation techniques. pp. 205- 216 ,(2010) , 10.1145/1854273.1854302

Cedric Bastoul, Code Generation in the Polyhedral Model Is Easier Than You Think international conference on parallel architectures and compilation techniques. pp. 7- 16 ,(2004) , 10.5555/1025127.1025992

S.J. Pennycook, S.D. Hammond, S.A. Wright, J.A. Herdman, I. Miller, S.A. Jarvis, An investigation of the performance portability of OpenCL Journal of Parallel and Distributed Computing. ,vol. 73, pp. 1439- 1450 ,(2013) , 10.1016/J.JPDC.2012.07.005

M Manikandan, U Bondhugula, S Krishnamoorthy, J Ramanujam, A Rountev, P Sadayappan, None, A compiler framework for optimization of affine loop nests for gpgpus Proceedings of the 22nd annual international conference on Supercomputing - ICS '08. pp. 225- 234 ,(2008) , 10.1145/1375527.1375562

John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, Wen-mei W. Hwu, Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs symposium on code generation and optimization. pp. 111- 119 ,(2010) , 10.1145/1772954.1772971

V. Balasundaram, K. Kennedy, A technique for summarizing data access and its use in parallelism enhancing transformations Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation - PLDI '89. ,vol. 24, pp. 41- 53 ,(1989) , 10.1145/73141.74822

Wen-Mei W. Hwu, John A. Stratton, Thoman B. Jablin, Hee-Seok Kim, Performance Portability in Accelerated Parallel Kernels hgpu.org. ,(2013)

10.

V. Balasundaram, K. Kennedy, A technique for summarizing data access and its use in parallelism enhancing transformations Sigplan Notices. ,(1989) , 10.1145/74818.74822

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

来源期刊

我的账户

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

来源期刊

相似文章 4

Enabling a Uniform OpenCL Device View for Heterogeneous Platforms

Improving performance portability for GPU-specific OpenCL kernels onmulti-core/many-coreCPUs by analysis-based transformations

Program Correctness by Transformation

Towards verified construction of correct and optimised GPU software

我的账户