Ant: A Debugging Framework for MPI Parallel Programs

作者: Jae-Woo Lee , Leonardo R. Bachega , Samuel P. Midkiff , Y. C. Hu

DOI: 10.1007/978-3-642-37658-0_15

关键词: Distributed computingAnomaly detectionComputer scienceParallel computingDebugging

摘要: This paper describes Ant, a debugging framework targeting MPI parallel programs. The Ant statically analyzes programs, marking code regions as being executed by all processes or only some of the processes. analyzed program is then instrumented with calls to an invariant violation monitoring and detection library. analysis allows be based on whether all, less than execute region. Ant’s instrumentation strategy sampled across in We present case study using C-DIDUCE (a variant DIDUCE for C) find violations value invariants C/MPI reduces overhead over 14 times impact accuracy scheme that simply distributes executing program.

参考文章(23)
Steve Sistare, Erica Dorenkamp, Nick Nevin, Eugene Loh, MPI Support in the PrismTM Programming Environment conference on high performance computing (supercomputing). pp. 22- 22 ,(1999) , 10.1145/331532.331554
Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski, Gregory L. Lee, Barton P. Miller, Martin Schulz, Stack Trace Analysis for Large Scale Debugging international parallel and distributed processing symposium. pp. 1- 10 ,(2007) , 10.1109/IPDPS.2007.370254
Sudheendra Hangal, Monica S. Lam, Tracking down software bugs using automatic anomaly detection international conference on software engineering. pp. 291- 301 ,(2002) , 10.1145/581339.581377
Chao Liu, Xifeng Yan, Long Fei, Jiawei Han, Samuel P. Midkiff, SOBER: statistical model-based bug localization foundations of software engineering. ,vol. 30, pp. 286- 295 ,(2005) , 10.1145/1081706.1081753
Qi Gao, Feng Qin, Dhabaleswar K. Panda, DMTracker Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07. pp. 15- ,(2007) , 10.1145/1362622.1362643
Bronis R. de Supinski, Ben Liblit, Matthew Legendre, Dorian C. Arnold, Dong H. Ahn, Gregory L. Lee, Martin Schulz, Barton P. Miller, Lessons learned at 208K: towards debugging millions of cores ieee international conference on high performance computing data and analytics. pp. 26- ,(2008) , 10.5555/1413370.1413397
Thomas Ostrand, Tarak Goradia, Monica Hutchins, Herb Foster, Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria international conference on software engineering. pp. 191- 200 ,(1994) , 10.5555/257734.257766
Pin Zhou, Wei Liu, Long Fei, Shan Lu, Feng Qin, Yuanyuan Zhou, S. Midkiff, J. Torrellas, AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants international symposium on microarchitecture. pp. 269- 280 ,(2004) , 10.1109/MICRO.2004.3
Ben Liblit, Alex Aiken, Alice X. Zheng, Michael I. Jordan, Bug isolation via remote program sampling programming language design and implementation. ,vol. 38, pp. 141- 154 ,(2003) , 10.1145/780822.781148
Doreen Cheng, Robert Hood, A portable debugger for parallel and distributed programs conference on high performance computing (supercomputing). pp. 723- 732 ,(1994) , 10.5555/602770.602889