作者: Hongyuan Liu , King Tin Lam , Huanxin Lin , Cho-Li Wang , Junchao Ma
关键词: Dependency (UML) 、 General-purpose computing on graphics processing units 、 Parallel computing 、 Speedup 、 Code generation 、 Compiler 、 Lockstep 、 Computer science 、 SIMD
摘要: General-purpose GPUs have been prevalent for a decade. Nevertheless, GPU programming remains an onerous job practically exclusive to veteran developers who must know both domain-specific knowledge and architecture well. Although current parallelizing compilers that automatically parallelize offload sizable loops onto the helped in unfettering power of with minimal effort, there are still family carry statically non-deterministic data dependencies cannot be parallelized. To tackle this issue, we propose two lightweight dependency checking schemes very different from existing conservative assist dependencies. Our feature linear work complexity memory operations, lower consumption compared previous work, false positives by leveraging lockstep execution on GPU's SIMD lanes. Experiments done using microbenchmarking real-life applications latest advanced AMD discrete show our can achieve 2.2 × speedup over solutions dependency-free cases while only taking about 20% time case unproven loop-carried