A Profiling Method for Analyzing Scalability Bottlenecks on Multicores

作者: David Eklöv , Nikos Nikoleris , Erik Hagersten

DOI:

关键词:

摘要: To reduce latency and increase bandwidth to memory, modern microprocessors are often designed with deep memory hierarchies including several levels of caches. For such microprocessors, both the off-chip typically about two orders magnitude worse than fastest on-chip cache. Consequently, performance many applications is largely determined by how well they utilize caches bandwidths in hierarchy. applications, there principal approaches improve performance: optimize hierarchy software. In cases, it important qualitatively quantitatively understand software utilizes interacts resources (e.g., cache bandwidths) hierarchy.This thesis presents novel profiling methods for memory-centric analysis. The goal these provide general, high-level, quantitative information describing profiled hierarchy, thereby help hardware developers identify opportunities related optimizations. techniques be broadly applicable data collection should have minimal impact on application, while not being dependent custom and/or operating system extensions. Furthermore, resulting accurate easy interpret.While use cases presented, main focus this design evaluation core methods. These measure estimate high-level metrics, as miss-and fetch ratio; demand; execution rate affected amount receive. This shows that can accurately obtained very little without requiring costly simulations or support.

参考文章(13)
Christian Bienia, Kai Li, Benchmarking modern multiprocessors Princeton University. ,(2011)
Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout, Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware international symposium on performance analysis of systems and software. pp. 216- 226 ,(2011) , 10.1109/ISPASS.2011.5762738
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith, A mechanistic performance model for superscalar out-of-order processors ACM Transactions on Computer Systems. ,vol. 27, pp. 1- 37 ,(2009) , 10.1145/1534909.1534910
John L. Henning, SPEC CPU2006 benchmark descriptions ACM Sigarch Computer Architecture News. ,vol. 34, pp. 1- 17 ,(2006) , 10.1145/1186736.1186737
David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten, Bandwidth bandit: Understanding memory contention international symposium on performance analysis of systems and software. pp. 116- 117 ,(2012) , 10.1109/ISPASS.2012.6189214
Stijn Eyerman, Kristof Du Bois, Lieven Eeckhout, Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications international symposium on performance analysis of systems and software. pp. 145- 155 ,(2012) , 10.1109/ISPASS.2012.6189221
David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten, Cache Pirating: Measuring the Curse of the Shared Cache international conference on parallel processing. pp. 165- 175 ,(2011) , 10.1109/ICPP.2011.15
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith, A performance counter architecture for computing accurate CPI components Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII. ,vol. 40, pp. 175- 184 ,(2006) , 10.1145/1168857.1168880
Nathan R. Tallent, Laksono Adhianto, John M. Mellor-Crummey, Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles ieee international conference on high performance computing data and analytics. pp. 1- 11 ,(2010) , 10.1109/SC.2010.47
Nathan R. Tallent, John M. Mellor-Crummey, Allan Porterfield, Analyzing lock contention in multithreaded applications acm sigplan symposium on principles and practice of parallel programming. ,vol. 45, pp. 269- 280 ,(2010) , 10.1145/1693453.1693489