作者: Derek L. Schuff , Milind Kulkarni , Vijay S. Pai
关键词: Overhead (computing) 、 Data structure 、 Multi-core processor 、 Parallel computing 、 Computer science 、 Optimizing compiler 、 Cache 、 Reuse 、 Multithreading 、 Shared memory
摘要: Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization manual optimization of programs. Existing reuse methods either do not account the effects multithreading, or suffer severe performance penalties. This paper presents sampled, parallelized method measuring profiles multithreaded programs, modeling private shared configurations. The sampling technique allows it to spend much its execution in fast low-overhead mode, use new measurement since sampled does need consider full state stack. uses O(1) data structures that may be made thread-private, allowing parallelization reduce overhead mode. resulting system analyzed diverse set parallel benchmarks shown generate accurate output compared non-sampled as well good results common application locating low-locality code benchmarks, all with comparable best single-threaded techniques.