Managing Wire Delay in Large Chip-Multiprocessor Caches

作者: B.M. Beckmann , D.A. Wood

DOI: 10.1109/MICRO.2004.21

关键词:

摘要: In response to increasing (relative) wire delay, architects have proposed various technologies manage the impact of slow wires on large uniprocessor L2 caches. Block migration (e.g., D-NUCA and NuRapid) reduces average hit latency by migrating frequently used blocks towards lower-latency banks. Transmission Line Caches (TLC) use on-chip transmission lines provide low all Traditional stride-based hardware prefetching strives tolerate, rather than reduce, latency. Chip multiprocessors (CMPs) present additional challenges. First, CMPs often share cache, requiring multiple ports sufficient bandwidth. Second, threads mean working sets, which compete for limited storage. Third, sharing code data interferes with block migration, since one processor's low-latency bank is another high-latency bank. this paper, we develop cache designs that incorporate these three management techniques. We detailed full-system simulation analyze performance trade-offs both commercial scientific workloads. demonstrate less effective because 40-60% hits in workloads are satisfied central banks, equally far from processors. observe although latency, contention their restricted bandwidth limits performance. show between L1 caches alone improves at least as much other two Finally, a hybrid design-combining techniques-that an 2% 19% over alone.

参考文章(44)
G. Hinton, The microarchitecture of the Pentium 4 processor Intel Technical Journal. ,vol. 1, ,(2001)
Vishal Aslot, Max Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley B. Jones, Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance international workshop on openmp. pp. 1- 10 ,(2001) , 10.1007/3-540-44587-0_1
M. Karlsson, K.E. Moore, E. Hagersten, D.A. Wood, Memory system behavior of Java-based middleware high-performance computer architecture. pp. 217- 228 ,(2003) , 10.1109/HPCA.2003.1183540
K. So, R.N. Rechtschaffen, Cache operations by MRU change IEEE Transactions on Computers. ,vol. 37, pp. 700- 709 ,(1988) , 10.1109/12.2208
Partha Kundu, Murali Annavaram, Trung Diep, John Shen, A case for shared instruction cache on chip multiprocessors running OLTP ACM SIGARCH Computer Architecture News. ,vol. 32, pp. 11- 18 ,(2004) , 10.1145/1024295.1024297
C. McNairy, D. Soltis, Itanium 2 processor microarchitecture IEEE Micro. ,vol. 23, pp. 44- 55 ,(2003) , 10.1109/MM.2003.1196114
T. Horel, G. Lauterbach, UltraSPARC-III: designing third-generation 64-bit performance IEEE Micro. ,vol. 19, pp. 73- 85 ,(1999) , 10.1109/40.768506
L. Hammond, B.A. Hubbert, M. Siu, M.K. Prabhu, M. Chen, K. Olukolun, The Stanford Hydra CMP IEEE Micro. ,vol. 20, pp. 71- 84 ,(2000) , 10.1109/40.848474
R.T. Chang, N. Talwalkar, C.P. Yue, S.S. Wong, Near speed-of-light signaling over on-chip electrical interconnects IEEE Journal of Solid-state Circuits. ,vol. 38, pp. 834- 838 ,(2003) , 10.1109/JSSC.2003.810060