A Profiling Method for Analyzing Scalability Bottlenecks on Multicores

作者： David Eklöv , Nikos Nikoleris , Erik Hagersten

DOI:

关键词:

摘要: To reduce latency and increase bandwidth to memory, modern microprocessors are often designed with deep memory hierarchies including several levels of caches. For such microprocessors, both the off-chip typically about two orders magnitude worse than fastest on-chip cache. Consequently, performance many applications is largely determined by how well they utilize caches bandwidths in hierarchy. applications, there principal approaches improve performance: optimize hierarchy software. In cases, it important qualitatively quantitatively understand software utilizes interacts resources (e.g., cache bandwidths) hierarchy.This thesis presents novel proﬁling methods for memory-centric analysis. The goal these provide general, high-level, quantitative information describing proﬁled hierarchy, thereby help hardware developers identify opportunities related optimizations. techniques be broadly applicable data collection should have minimal impact on application, while not being dependent custom and/or operating system extensions. Furthermore, resulting accurate easy interpret.While use cases presented, main focus this design evaluation core methods. These measure estimate high-level metrics, as miss-and fetch ratio; demand; execution rate affected amount receive. This shows that can accurately obtained very little without requiring costly simulations or support.

diva-portal.org 本地加速

uu.se LINK 下载加速

参考文章(13)

Christian Bienia, Kai Li, Benchmarking modern multiprocessors Princeton University. ,(2011)

Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout, Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware international symposium on performance analysis of systems and software. pp. 216- 226 ,(2011) , 10.1109/ISPASS.2011.5762738

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith, A mechanistic performance model for superscalar out-of-order processors ACM Transactions on Computer Systems. ,vol. 27, pp. 1- 37 ,(2009) , 10.1145/1534909.1534910

John L. Henning, SPEC CPU2006 benchmark descriptions ACM Sigarch Computer Architecture News. ,vol. 34, pp. 1- 17 ,(2006) , 10.1145/1186736.1186737

David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten, Bandwidth bandit: Understanding memory contention international symposium on performance analysis of systems and software. pp. 116- 117 ,(2012) , 10.1109/ISPASS.2012.6189214

Stijn Eyerman, Kristof Du Bois, Lieven Eeckhout, Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications international symposium on performance analysis of systems and software. pp. 145- 155 ,(2012) , 10.1109/ISPASS.2012.6189221

David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten, Cache Pirating: Measuring the Curse of the Shared Cache international conference on parallel processing. pp. 165- 175 ,(2011) , 10.1109/ICPP.2011.15

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith, A performance counter architecture for computing accurate CPI components Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII. ,vol. 40, pp. 175- 184 ,(2006) , 10.1145/1168857.1168880

Nathan R. Tallent, Laksono Adhianto, John M. Mellor-Crummey, Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles ieee international conference on high performance computing data and analytics. pp. 1- 11 ,(2010) , 10.1109/SC.2010.47

10.

Nathan R. Tallent, John M. Mellor-Crummey, Allan Porterfield, Analyzing lock contention in multithreaded applications acm sigplan symposium on principles and practice of parallel programming. ,vol. 45, pp. 269- 280 ,(2010) , 10.1145/1693453.1693489

A Profiling Method for Analyzing Scalability Bottlenecks on Multicores

来源期刊

我的账户

A Profiling Method for Analyzing Scalability Bottlenecks on Multicores

来源期刊

相似文章 7

Dynamic Analysis of Application Delivery Network for Leveraging Software Defined Infrastructures

A Top-Down method for performance analysis and counters architecture

A Java util concurrent park contention tool

Top-Down Characterization Approximation based on performance counters architecture for AMD processors

RMC: an integrated runtime system for adaptive many-core computing

Analysis of an application delivery platform for software defined infrastructures

Analysis of Application Delivery Platform for Software Defined Infrastructures

我的账户