作者: Allen D Malony , Scott Biersdorff , Sameer Shende , Heike Jagode , Stanimire Tomov
DOI: 10.1109/ICPP.2011.71
关键词: Parallel computing 、 Computer science 、 Computer architecture 、 Set (abstract data type) 、 Coprocessor 、 CUDA 、 Computation 、 Performance measurement
摘要: The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools deliver high-performing applications. This paper studies the problems associated performance measurement machines GPUs. A computation model alternative host-GPU approaches are discussed set stage for reporting capabilities in three leading HPC tools: PAPI, Vampir, TAU Performance System. Our work leverages CUPTI tool support NVIDIA's CUDA device library. Heterogeneous benchmarks from SHOC suite used demonstrate methods support.