Lightweight Dependency Checking for Parallelizing Loops with Non-Deterministic Dependency on GPU

作者： Hongyuan Liu , King Tin Lam , Huanxin Lin , Cho-Li Wang , Junchao Ma

DOI: 10.1109/ICPADS.2016.0119

关键词: Dependency (UML) 、 General-purpose computing on graphics processing units 、 Parallel computing 、 Speedup 、 Code generation 、 Compiler 、 Lockstep 、 Computer science 、 SIMD

摘要: General-purpose GPUs have been prevalent for a decade. Nevertheless, GPU programming remains an onerous job practically exclusive to veteran developers who must know both domain-specific knowledge and architecture well. Although current parallelizing compilers that automatically parallelize offload sizable loops onto the helped in unfettering power of with minimal effort, there are still family carry statically non-deterministic data dependencies cannot be parallelized. To tackle this issue, we propose two lightweight dependency checking schemes very different from existing conservative assist dependencies. Our feature linear work complexity memory operations, lower consumption compared previous work, false positives by leveraging lockstep execution on GPU's SIMD lanes. Experiments done using microbenchmarking real-life applications latest advanced AMD discrete show our can achieve 2.2 × speedup over solutions dependency-free cases while only taking about 20% time case unproven loop-carried

uni-trier.de 本地加速

ieee.org 本地加速

ieee.org LINK 下载加速

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(28)

Hans Meuer, E. Strohmaier, J. Dongarra, Horst Simon, Top500 Supercomputer Sites University of Tennessee. ,(1997)

Zheng Wang, Daniel Powell, Björn Franke, Michael O’Boyle, Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code compiler construction. pp. 154- 173 ,(2014) , 10.1007/978-3-642-54807-9_9

Edward Kandrot, Jason Sanders, CUDA by Example: An Introduction to General-Purpose GPU Programming ,(2010)

Zhen Li, Ali Jannesari, Felix Wolf, An Efficient Data-Dependence Profiler for Sequential and Parallel Programs international parallel and distributed processing symposium. pp. 484- 493 ,(2015) , 10.1109/IPDPS.2015.41

Jingling Xue, Loop tiling for parallelism ,(2000)

Jeff A. Stuart, John D. Owens, Efficient Synchronization Primitives for GPUs arXiv: Operating Systems. ,(2011)

Alain Ketterlin, Philippe Clauss, Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization international symposium on microarchitecture. pp. 437- 448 ,(2012) , 10.1109/MICRO.2012.47

John Nickolls, William J Dally, The GPU Computing Era IEEE Micro. ,vol. 30, pp. 56- 69 ,(2010) , 10.1109/MM.2010.41

Anup Holey, Vineeth Mekkat, Antonia Zhai, HAccRG: Hardware-Accelerated Data Race Detection in GPUs international conference on parallel processing. pp. 60- 69 ,(2013) , 10.1109/ICPP.2013.15

10.

Christoph von Praun, Rajesh Bordawekar, Calin Cascaval, Modeling optimistic concurrency using quantitative dependence analysis Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08. pp. 185- 196 ,(2008) , 10.1145/1345206.1345234

Lightweight Dependency Checking for Parallelizing Loops with Non-Deterministic Dependency on GPU

来源期刊

我的账户

Lightweight Dependency Checking for Parallelizing Loops with Non-Deterministic Dependency on GPU

来源期刊

相似文章 0

我的账户