作者: Shirley Moore , David Cronk , Sameer Shende , Allen Malony
DOI: 10.1109/HPCMP-UGC.2006.43
关键词: Computer engineering 、 Nested loop join 、 Military computing 、 Parallel computing 、 Profiling (computer programming) 、 Memory performance 、 Fortran 、 Computer science 、 Floating point
摘要: Performance of computationally intensive applications often depends critically on the floating point and memory performance nested loop structures. this paper describes extensions to Tuning Analysis Utilities (TAU) parallel system that implement automated C/C++ Fortran programs collect loop-level profile data. Link-time run-time options for configuring instrumented version code perform various types measurements, such as time hardware counter based profiling are described. Finally, examples given collecting analyzing data several DoD applications.