作者: Jae-Woo Lee , Leonardo R. Bachega , Samuel P. Midkiff , Y. C. Hu
DOI: 10.1007/978-3-642-37658-0_15
关键词: Distributed computing 、 Anomaly detection 、 Computer science 、 Parallel computing 、 Debugging
摘要: This paper describes Ant, a debugging framework targeting MPI parallel programs. The Ant statically analyzes programs, marking code regions as being executed by all processes or only some of the processes. analyzed program is then instrumented with calls to an invariant violation monitoring and detection library. analysis allows be based on whether all, less than execute region. Ant’s instrumentation strategy sampled across in We present case study using C-DIDUCE (a variant DIDUCE for C) find violations value invariants C/MPI reduces overhead over 14 times impact accuracy scheme that simply distributes executing program.