Software-oriented distributed shared cache management for chip multiprocessors

作者: Sanyenn Cho , Lei Jin

DOI:

关键词:

摘要: This thesis proposes a software-oriented distributed shared cache management approach for chip multiprocessors (CMPs). Unlike hardware-based schemes, our offloads the task to trace analysis phase, allowing flexible strategies. For single-threaded programs, static 2D page coloring scheme is proposed utilize oracle information derive an optimal data placement schema program. In addition, dynamic as practical solution, which tries performance of scheme. The evaluation results show that achieves 44.7% improvement over conventional on average while performs 32.3% better than latency-oriented multithreaded pattern recognition algorithm based K-means clustering method introduced. identify access patterns can be utilized guide private and replication data. experimental these lead 19% reduced remote accesses aggregated miss rate result in much lower bandwidth requirements on-chip network off-chip main memory bus. Lastly, throughput-oriented we propose hint-guided instructions target program with high reuse property. derived hints are then used at run time. By balancing amount local pressure, has potential help achieve comparable best existing schemes. Our effective way manage CMPs. provides alternative direction research problem. Given known difficulties (e.g., scalability design complexity) face this may receive serious consideration from researchers future. perspective, valuable contributions computer architecture society.

参考文章(95)
James E. Smith, A study of branch prediction strategies international symposium on computer architecture. pp. 135- 148 ,(1981) , 10.1145/285930.285980
Zeshan Chishti, Michael D. Powell, T. N. Vijaykumar, Distance associativity for high-performance energy-efficient non-uniform cache architectures international symposium on microarchitecture. pp. 55- 66 ,(2003) , 10.5555/956417.956577
William J. Dally, Brian Towles, Route packets, net wires Proceedings of the 38th conference on Design automation - DAC '01. pp. 684- 689 ,(2001) , 10.1145/378239.379048
Tien-Fu Chen, Jean-Loup Baer, Effective hardware-based data prefetching for high-performance processors IEEE Transactions on Computers. ,vol. 44, pp. 609- 623 ,(1995) , 10.1109/12.381947
Changkyu Kim, Doug Burger, Stephen W. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02. ,vol. 37, pp. 211- 222 ,(2002) , 10.1145/605397.605420
M. A. Holliday, Reference history, page size, and migration daemons in local/remote architectures architectural support for programming languages and operating systems. ,vol. 17, pp. 104- 112 ,(1989) , 10.1145/68182.68192
Timothy Sherwood, Suleyman Sair, Brad Calder, Predictor-directed stream buffers Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture - MICRO 33. pp. 42- 53 ,(2000) , 10.1145/360128.360135
David W. Wall, Limits of instruction-level parallelism Proceedings of the fourth international conference on Architectural support for programming languages and operating systems - ASPLOS-IV. ,vol. 19, pp. 176- 188 ,(1991) , 10.1145/106972.106991
Lei Jin, Sangyeun Cho, Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches international conference on parallel processing. pp. 487- 494 ,(2008) , 10.1109/ICPP.2008.29