Fast data-locality profiling of native execution

作者: Erik Berg , Erik Hagersten

DOI: 10.1145/1064212.1064232

关键词: Operating systemLocalitySmart CacheCacheProfiling (computer programming)Cache algorithmsComputer scienceEmbedded systemCache coloring

摘要: Performance tools based on hardware counters can efficiently profile the cache behavior of an application and help software developers improve its utilization. Simulator-based potentially provide more insights flexibility model many different configurations, but have drawback large run-time overhead.We present StatCache, a performance tool statistical model. It has small overhead while providing much simulator-based tools. A monitor process running in background collects sparse memory access statistics about analyzed natively host computer. Generic locality information is derived presented code-centric and/or data-centric view.We evaluate accuracy using ten SPEC CPU2000 benchmarks. We also exemplify how be used to better understand characteristics cache-related problems.

参考文章(35)
Alan Eustace, Amitabh Srivastava, ATOM: a flexible interface for building high performance program analysis tools usenix annual technical conference. pp. 25- 25 ,(1995)
Erik Berg, Erik Hagersten, SIP: Performance Tuning through Source Code Interdependence european conference on parallel processing. pp. 177- 186 ,(2002) , 10.1007/3-540-45706-2_22
X. Vera, Jingling Xue, Let's study whole-program cache behaviour analytically high-performance computer architecture. pp. 175- 186 ,(2002) , 10.1109/HPCA.2002.995708
Bengt Werner, Fredrik Larsson, Peter S. Magnusson, Fredrik Lundholm, Magnus Karlsson, Andreas Moestedt, Per Stenström, Fredrik Dahlgren, Jim Nilsson, Håkan Grahn, SimICS/sun4m: a virtual workstation usenix annual technical conference. pp. 10- 10 ,(1998)
Kristof Beyls, Yijun Yu, Erik H. D'Hollander, Visualization enables the programmer to reduce cache misses iasted international conference on parallel and distributed computing and systems. pp. 781- 786 ,(2002)
John Mellor-Crummey, Robert Fowler, David Whalley, Tools for application-oriented performance tuning international conference on supercomputing. pp. 154- 165 ,(2001) , 10.1145/377792.377826
Trishul M. Chilimbi, Efficient representations and abstractions for quantifying and exploiting data reference locality Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation - PLDI '01. ,vol. 36, pp. 191- 202 ,(2001) , 10.1145/378795.378840
S. Laha, J.H. Patel, R.K. Iyer, Accurate low-cost methods for performance evaluation of cache memory systems IEEE Transactions on Computers. ,vol. 37, pp. 1325- 1336 ,(1988) , 10.1109/12.8699
Mendel Rosenblum, Edouard Bugnion, Scott Devine, Stephen A. Herrod, Using the SimOS machine simulator to study complex computer systems ACM Transactions on Modeling and Computer Simulation. ,vol. 7, pp. 78- 103 ,(1997) , 10.1145/244804.244807
Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, Brad Calder, Using SimPoint for accurate and efficient simulation measurement and modeling of computer systems. ,vol. 31, pp. 318- 319 ,(2003) , 10.1145/781027.781076