作者: Xu Liu , John Mellor-Crummey
关键词: Computer science 、 Distributed computing 、 Database-centric architecture 、 Profiling (computer programming) 、 Locality 、 Latency (engineering) 、 Scalability
摘要: It is difficult to manually identify opportunities for enhancing data locality. To address this problem, we extended the HPCToolkit performance tools support data-centric profiling of scalable parallel programs. Our tool uses hardware counters directly measure memory access latency and attributes metrics both variables instructions. Different provide insight into different aspects locality (or lack thereof). Unlike prior analysis, our employs measurement, presentation methods that enable it analyze behavior programs with low runtime space overhead. We demonstrate utility HPCToolkit's new analysis capabilities case studies five well-known benchmarks. In each benchmark, bottlenecks caused by poor non-trivial optimizations enabled guidance.