Lessons learned at 208K: towards debugging millions of cores

作者: Bronis R. de Supinski , Ben Liblit , Matthew Legendre , Dorian C. Arnold , Dong H. Ahn

DOI: 10.5555/1413370.1413397

关键词: Process (engineering)Stack traceSystem softwarePetascale computingDistributed computingComputer scienceInfiniBandFile systemDebuggingScalability

摘要: Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures analysis algorithms collect process application data. In addition, at such scales, each tool itself become a large parallel - already, debugging the full Blue-Gene/L (BG/L) installation Lawrence Livermore National Laboratory requires employing 1664 daemons. To reach sizes beyond, must communication infrastructure manage their own processes efficiently. Some system resources, as file system, also bottlenecks. this paper, we petascale development, using stack trace (STAT) case study. STAT is lightweight gathers merges traces from identify equivalence classes. We results gathered thousands tasks on an Infiniband cluster up 208 K BG/L current scalability issues well be faced petascale. then implemented solutions these show resulting improvements. discuss future plans meet demands machines.

参考文章(18)
Hans Meuer, E. Strohmaier, J. Dongarra, Horst Simon, Top500 Supercomputer Sites University of Tennessee. ,(1997)
B R de Supinski, D C Arnold, D H Ahn, G L Lee, M W Schulz, B P Miller, Benchmarking the Stack Trace Analysis Tool for BlueGene/L parallel computing. pp. 621- 628 ,(2007)
Markus Geimer, Felix Wolf, Björn Kuhlmann, Farzona Pulatova, Brian J. N. Wylie, Scalable Collation and Presentation of Call-Path Profile Data with CUBE parallel computing. pp. 645- 652 ,(2007)
Robert Bell, Allen D. Malony, Sameer Shende, ParaProf : A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis european conference on parallel processing. pp. 17- 26 ,(2003) , 10.1007/978-3-540-45209-6_7
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr, Scalable Parallel Trace-Based Performance Analysis Recent Advances in Parallel Virtual Machine and Message Passing Interface. ,vol. 4, pp. 303- 312 ,(2006) , 10.1007/11846802_43
Hans-Christian Hoppe, Wolfgang E. Nagel, Karl Solchenbach, Michael Weber, Alfred Arnold, VAMPIR: Visualization and Analysis of MPI Resources ,(2010)
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende, TA UoverSupermon : low-overhead online parallel performance monitoring european conference on parallel processing. pp. 85- 96 ,(2007) , 10.1007/978-3-540-74466-5_11
Susanne M. Balle, Bevin R. Brett, Chih-Ping Chen, David LaFrance-Linden, Extending a traditional debugger to debug massively parallel applications Journal of Parallel and Distributed Computing. ,vol. 64, pp. 617- 628 ,(2004) , 10.1016/J.JPDC.2004.03.012
Martin Schulz, Dong Ahn, Andrew Bernat, Bronis R. de Supinski, Steven Y. Ko, Gregory Lee, Barry Rountree, Scalable dynamic binary instrumentation for Blue Gene/L ACM SIGARCH Computer Architecture News. ,vol. 33, pp. 9- 14 ,(2005) , 10.1145/1127577.1127581
Don Maghrak, Martin Schulz, Jim Galarowicz, Scott Cranford, David Montoya, William Hachfeld, Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis Scientific Programming. ,vol. 16, pp. 105- 121 ,(2008) , 10.3233/SPR-2008-0256