作者: Bronis R. de Supinski , Ben Liblit , Matthew Legendre , Dorian C. Arnold , Dong H. Ahn
关键词: Process (engineering) 、 Stack trace 、 System software 、 Petascale computing 、 Distributed computing 、 Computer science 、 InfiniBand 、 File system 、 Debugging 、 Scalability
摘要: Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures analysis algorithms collect process application data. In addition, at such scales, each tool itself become a large parallel - already, debugging the full Blue-Gene/L (BG/L) installation Lawrence Livermore National Laboratory requires employing 1664 daemons. To reach sizes beyond, must communication infrastructure manage their own processes efficiently. Some system resources, as file system, also bottlenecks. this paper, we petascale development, using stack trace (STAT) case study. STAT is lightweight gathers merges traces from identify equivalence classes. We results gathered thousands tasks on an Infiniband cluster up 208 K BG/L current scalability issues well be faced petascale. then implemented solutions these show resulting improvements. discuss future plans meet demands machines.