Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System

作者: John Linford , Tyler A. Simon , Sameer Shende , Allen D. Malony

DOI: 10.1007/978-3-319-05215-1_8

关键词:

摘要: The recent development of a unified SHMEM framework, OpenSHMEM, has enabled further study in the porting and scaling applications that can benefit from programming model. This paper focuses on non-numerical graph algorithms, which typically have low FLOPS/byte ratio. An overview space time complexity Kruskal's Prim's algorithms for generating minimum spanning tree (MST) is presented, along with an implementation algorithm uses OpenSHEM to generate MST parallel without intermediate communication. Additionally, procedure applying TAU Performance System OpenSHMEM produce indepth performance profiles showing spent code regions, memory access patterns, network load presented. evaluations Cray XK7 "Titan" system at Oak Ridge National Laboratory 48 core shared University Maryland, Baltimore County are provided.

参考文章(18)
Hans Meuer, E. Strohmaier, J. Dongarra, Horst Simon, Top500 Supercomputer Sites University of Tennessee. ,(1997)
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr, Scalable Parallel Trace-Based Performance Analysis Recent Advances in Parallel Virtual Machine and Message Passing Interface. ,vol. 4, pp. 303- 312 ,(2006) , 10.1007/11846802_43
Andreas Knüpfer, Ronny Brendel, Holger Brunst, Hartmut Mix, Wolfgang E. Nagel, Introducing the open trace format (OTF) international conference on computational science. pp. 526- 533 ,(2006) , 10.1007/11758525_71
Matthias Birkner, José Alfredo López-Mimbela, Anton Wakolbinger, Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach Proceedings of the American Mathematical Society. ,vol. 130, pp. 2431- 2442 ,(2002) , 10.1090/S0002-9939-02-06322-0
A. Knupfer, H. Brunst, W.E. Nagel, High performance event trace visualization parallel, distributed and network-based processing. pp. 258- 263 ,(2005) , 10.1109/EMPDP.2005.24
Shirley Browne, Jack Dongarra, Nathan Garner, George Ho, Philip Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors ieee international conference on high performance computing data and analytics. ,vol. 14, pp. 189- 204 ,(2000) , 10.1177/109434200001400303
Joseph B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem Proceedings of the American Mathematical Society. ,vol. 7, pp. 48- 50 ,(1956) , 10.1090/S0002-9939-1956-0078686-7
David A. Bader, Guojing Cong, Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs Journal of Parallel and Distributed Computing. ,vol. 66, pp. 1366- 1378 ,(2006) , 10.1016/J.JPDC.2006.06.001
Jithin Jose, Krishna Kandalla, Miao Luo, Dhabaleswar K. Panda, Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation international conference on parallel processing. pp. 219- 228 ,(2012) , 10.1109/ICPP.2012.55