Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

作者: Kourosh Gharachorloo , Vijayaraghavan Soundararajan , John Hennessy , Mark Heinrich , Ben Verghese

DOI:

关键词: Interleaved memoryUniform memory accessDistributed shared memoryNon-uniform memory accessMemory controllerComputer scienceComputer architectureDistributed memoryCache-only memory architectureMemory map

摘要: Given the limitations of bus-based multiprocessors, CC-NUMA is scalable architecture choice for shared-memory machines. The most important characteristic that latency to access data on a remote node considerably larger than local memory. On such machines, good locality can reduce memory stall time and therefore critical factor in application performance.In this paper we study various options available system designers transparently decrease fraction misses serviced remotely. This work done context Stanford FLASH multiprocessor. unique each has single pool DRAM be used variety ways by programmable controller. We use programmability explore different cache-coherence data-locality compute-server workloads. First, consider two protocols providing base cache-coherence, one with centralized directory information (dynamic pointer allocation) another distributed (SCI). While several commercial systems are based SCI, find scheme superior performance. Next, hardware software techniques some or all improve locality. Finally, propose hybrid combines techniques. These schemes same platform both user kernel references from thus offers realistic fair comparison replication/migration not previously been feasible.

参考文章(30)
Andreas Nowatzyk, Sanjay Vishin, Michael Parkin, Edmund J. Kelly, Michael C. Browne, Gunes Aybay, Bill Radke, The S3.mp Scalable Shared Memory Multiprocessor. international conference on parallel processing. pp. 1- 10 ,(1995)
Anant Agarwal, David Chaiken, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Beng-Hong Lim, Gino Maa, Dan Nussbaum, THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR Springer, Boston, MA. pp. 239- 261 ,(1991) , 10.1007/978-1-4615-3604-8_13
John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach ,(1989)
Jr. Richard Thomas Simoni, Cache coherence directories for scalable multiprocessors Stanford University. ,(1992)
Jaswinder Pal Singh, Wolf-Dietrich Weber, Anoop Gupta, SPLASH: Stanford parallel applications for shared-memory ACM Sigarch Computer Architecture News. ,vol. 20, pp. 5- 44 ,(1992) , 10.1145/130823.130824
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry M. Levy, Scheduler activations ACM Transactions on Computer Systems. ,vol. 10, pp. 53- 79 ,(1992) , 10.1145/146941.146944
D.V. James, A.T. Laundrie, S. Gjessing, G.S. Sohi, Distributed-directory scheme: scalable coherent interface IEEE Computer. ,vol. 23, pp. 74- 77 ,(1990) , 10.1109/2.55503
Tom Lovett, Russell Clapp, STiNG: A CC-NUMA Computer System for the Commercial Marketplace international symposium on computer architecture. ,vol. 24, pp. 308- 317 ,(1996) , 10.1145/232973.233006
Raj Vaswani, John Zahorjan, The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors symposium on operating systems principles. ,vol. 25, pp. 26- 40 ,(1991) , 10.1145/121132.121140
W. Bolosky, R. Fitzgerald, M. Scott, Simple but effective techniques for NUMA memory management symposium on operating systems principles. ,vol. 23, pp. 19- 31 ,(1989) , 10.1145/74850.74854