Towards Runtime Analytics in a Parallel Performance System

作者: Allen D. Malony , Srinivasan Ramesh , Kevin Huck , Chad Wood , Sameer Shendey

DOI: 10.1109/HPCS48598.2019.9188097

关键词: Software engineeringScalable computingComputer scienceAnalyticsMeasure (data warehouse)Performance measurementWork (electrical)

摘要: Developers of scientific simulations use parallel performance systems to measure, analyze, and tune their applications on large-scale HPC machines. In the majority these systems, analysis takes place offline. More consequentially, if runtime analytics are desired, measurement infrastructures need be designed implemented in such a way make it possible. We investigate question how create capabilities by considering this objective reference platform – TAU Performance System. Our research work identifies general issues concern describes can addressed new TAUbased framework. Several case studies proposed as different examples. These prototyped, evaluated machines, discussed. The outcomes study suggest that has merit. Furthermore, we believe approach could directly carry forward other systems.

参考文章(13)
Jes us Labarta, Toni Cortes, Vincent Pillet, Sergi Girona, Jesus Labarta, PARAVER: A Tool to Visualize and Analyze Parallel Code ,(2007)
William Gropp, Ewing Lusk, Nathan Doss, Anthony Skjellum, A high-performance, portable implementation of the MPI message passing interface standard parallel computing. ,vol. 22, pp. 789- 828 ,(1996) , 10.1016/0167-8191(96)00024-5
Don Maghrak, Martin Schulz, Jim Galarowicz, Scott Cranford, David Montoya, William Hachfeld, Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis Scientific Programming. ,vol. 16, pp. 105- 121 ,(2008) , 10.3233/SPR-2008-0256
B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, T. Newhall, The Paradyn parallel performance measurement tool IEEE Computer. ,vol. 28, pp. 37- 46 ,(1995) , 10.1109/2.471178
W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, D.K. Panda, Design of High Performance MVAPICH2: MPI2 over InfiniBand cluster computing and the grid. ,vol. 1, pp. 43- 48 ,(2006) , 10.1109/CCGRID.2006.32
Sameer S. Shende, Allen D. Malony, The Tau Parallel Performance System ieee international conference on high performance computing data and analytics. ,vol. 20, pp. 287- 311 ,(2006) , 10.1177/1094342006064482
An Autonomic Performance Environment for Exascale Supercomputing Frontiers and Innovations: an International Journal archive. ,vol. 2, pp. 49- 66 ,(2015) , 10.14529/JSFI150305
Marc Buffat, Anne Cadiou, Lionel Le Penven, Christophe Pera, In situ analysis and visualization of massively parallel computations ieee international conference on high performance computing data and analytics. ,vol. 31, pp. 83- 90 ,(2017) , 10.1177/1094342015597081
A. C. Bauer, H. Abbasi, J. Ahrens, H. Childs, B. Geveci, S. Klasky, K. Moreland, P. O'Leary, V. Vishwanath, B. Whitlock, E. W. Bethel, In situ methods, infrastructures, and applications on high performance computing platforms ieee vgtc conference on visualization. ,vol. 35, pp. 577- 597 ,(2016) , 10.1111/CGF.12930
S. Sanchez, A. Bonnie, G. Van Heule, C. Robinson, A. DeConinck, K. Kelly, Q. Snead, J. Brandt, Design and Implementation of a Scalable HPC Monitoring System international parallel and distributed processing symposium. pp. 1721- 1725 ,(2016) , 10.1109/IPDPSW.2016.167