Accelerating multicore reuse distance analysis with sampling and parallelization

作者: Derek L. Schuff , Milind Kulkarni , Vijay S. Pai

DOI: 10.1145/1854273.1854286

关键词: Overhead (computing)Data structureMulti-core processorParallel computingComputer scienceOptimizing compilerCacheReuseMultithreadingShared memory

摘要: Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization manual optimization of programs. Existing reuse methods either do not account the effects multithreading, or suffer severe performance penalties. This paper presents sampled, parallelized method measuring profiles multithreaded programs, modeling private shared configurations. The sampling technique allows it to spend much its execution in fast low-overhead mode, use new measurement since sampled does need consider full state stack. uses O(1) data structures that may be made thread-private, allowing parallelization reduce overhead mode. resulting system analyzed diverse set parallel benchmarks shown generate accurate output compared non-sampled as well good results common application locating low-locality code benchmarks, all with comparable best single-threaded techniques.

参考文章(38)
Trishul Chilimbi, Chen Ding, A Composable Model for Analyzing Locality of Multi-threaded Programs Microsoft. ,(2009)
Julian Seward, Nicholas Nethercote, Using Valgrind to detect undefined value errors with bit-precision usenix annual technical conference. pp. 2- 2 ,(2005)
Michael Frumkin, Jerry Yan, Hao-Qiang Jin, The OpenMP Implementation of NAS Parallel Benchmarks and its Performance ,(2013)
Dave Dice, Ori Shalev, Nir Shavit, Transactional Locking II Lecture Notes in Computer Science. pp. 194- 208 ,(2006) , 10.1007/11864219_14
Kristof Beyls, Erik H. D’Hollander, Platform-Independent Cache Optimization by Pinpointing Low-Locality Reuse international conference on computational science. ,vol. 3038, pp. 448- 455 ,(2004) , 10.1007/978-3-540-24688-6_59
Yunlian Jiang, Eddy Z. Zhang, Kai Tian, Xipeng Shen, Is reuse distance applicable to data locality analysis on chip multiprocessors compiler construction. pp. 264- 282 ,(2010) , 10.1007/978-3-642-11970-5_15
Kristof Beyls, Erik H. D’Hollander, Frederik Vandeputte, RDVIS: A Tool that Visualizes the Causes of Low Locality and Hints Program Optimizations Lecture Notes in Computer Science. ,vol. 3515, pp. 166- 173 ,(2005) , 10.1007/11428848_21
Vishal Aslot, Max Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley B. Jones, Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance international workshop on openmp. pp. 1- 10 ,(2001) , 10.1007/3-540-44587-0_1