作者: Thomas R. W. Scogland , Mayank Daga , Wu-chun Feng
DOI:
关键词: Parallel computing 、 TOP500 、 Graphics 、 General-purpose computing on graphics processing units 、 CUDA 、 Node (networking) 、 Computer science 、 Computer cluster 、 Graphics processing unit 、 Isolation (database systems)
摘要: The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers world, as ranked by TOP500, employ GPUs accelerators. Despite this increasing interest GPUs, however, optimizing performance of a GPU-accelerated compute node requires deep technical knowledge underlying architecture. Although significant literature exists on how to optimize GPU performance the more mature NVIDIA CUDA architecture, converse is true for OpenCL AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for GPU. most prominent optimizations include (i) explicit use registers, (ii) vector types, (iii) removal branches, (iv) image memory global data. We demonstrate efficacy our optimizations by applying each optimization isolation well concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific optimizations, Radeon HD 5870 delivers 65% better than with wellknown NVIDIA-specific optimizations.