Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs

作者: Allen D Malony , Scott Biersdorff , Sameer Shende , Heike Jagode , Stanimire Tomov

DOI: 10.1109/ICPP.2011.71

关键词: Parallel computingComputer scienceComputer architectureSet (abstract data type)CoprocessorCUDAComputationPerformance measurement

摘要: The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools deliver high-performing applications. This paper studies the problems associated performance measurement machines GPUs. A computation model alternative host-GPU approaches are discussed set stage for reporting capabilities in three leading HPC tools: PAPI, Vampir, TAU Performance System. Our work leverages CUPTI tool support NVIDIA's CUDA device library. Heterogeneous benchmarks from SHOC suite used demonstrate methods support.

参考文章(12)
Matthias Jurenz, Hartmut Mix, Andreas Knüpfer, Matthias Lieber, Wolfgang E. Nagel, Holger Brunst, Matthias S. Müller, Developing Scalable Applications with Vampir, VampirServer and VampirTrace parallel computing. pp. 637- 644 ,(2007)
Holger Brunst, Daniel Hackenberg, Guido Juckeland, Heide Rohling, Comprehensive Performance Tracking with Vampir 7 Parallel Tools Workshop. pp. 17- 29 ,(2010) , 10.1007/978-3-642-11261-4_2
Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S. Müller, Wolfgang E. Nagel, The Vampir Performance Analysis Tool-Set Parallel Tools Workshop. pp. 139- 155 ,(2008) , 10.1007/978-3-540-68564-7_9
Dan Terpstra, Heike Jagode, Haihang You, Jack Dongarra, None, Collecting Performance Data with PAPI-C ieee international conference on high performance computing data and analytics. pp. 157- 173 ,(2010) , 10.1007/978-3-642-11261-4_11
Wen-mei W. Hwu, David B. Kirk, Programming Massively Parallel Processors: A Hands-on Approach Morgan Kaufmann. ,(2012)
Shirley Browne, Jack Dongarra, Nathan Garner, George Ho, Philip Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors ieee international conference on high performance computing data and analytics. ,vol. 14, pp. 189- 204 ,(2000) , 10.1177/109434200001400303
Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam, An experimental approach to performance measurement of heterogeneous parallel applications using CUDA Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10. pp. 127- 136 ,(2010) , 10.1145/1810085.1810105
Robert Dietrich, Thomas Ilsche, Guido Juckeland, Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures 2010 39th International Conference on Parallel Processing Workshops. pp. 135- 143 ,(2010) , 10.1109/ICPPW.2010.30
Sameer S. Shende, Allen D. Malony, The Tau Parallel Performance System ieee international conference on high performance computing data and analytics. ,vol. 20, pp. 287- 311 ,(2006) , 10.1177/1094342006064482
Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, Jeffrey S. Vetter, The Scalable Heterogeneous Computing (SHOC) benchmark suite Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. pp. 63- 74 ,(2010) , 10.1145/1735688.1735702