作者: Zheng Wang , Dominik Grewe , Michael F. P. O’boyle
DOI: 10.1145/2677036
关键词: Scheme (programming language) 、 Host (network) 、 Code (cryptography) 、 Multi-core processor 、 Computer science 、 Key (cryptography) 、 Parallel computing 、 Code generation 、 Benchmark (computing) 、 Compiler
摘要: General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature our scheme that it leverages existing transformations, especially improve on GPU architectures and uses automatic machine learning build predictive model determine if worthwhile running or multicore host. We applied entire NAS benchmark suite evaluated distinct systems. achieved average (up to) speedups 4.51× 4.20× (143× 67×) Core i7/NVIDIA GeForce GTX580 i7/AMD Radeon 7970 platforms, respectively, over sequential baseline. Our achieves, average, greater than 10× two state-of-the-art generators.