作者: Kourosh Gharachorloo , Vijayaraghavan Soundararajan , John Hennessy , Mark Heinrich , Ben Verghese
DOI:
关键词: Interleaved memory 、 Uniform memory access 、 Distributed shared memory 、 Non-uniform memory access 、 Memory controller 、 Computer science 、 Computer architecture 、 Distributed memory 、 Cache-only memory architecture 、 Memory map
摘要: Given the limitations of bus-based multiprocessors, CC-NUMA is scalable architecture choice for shared-memory machines. The most important characteristic that latency to access data on a remote node considerably larger than local memory. On such machines, good locality can reduce memory stall time and therefore critical factor in application performance.In this paper we study various options available system designers transparently decrease fraction misses serviced remotely. This work done context Stanford FLASH multiprocessor. unique each has single pool DRAM be used variety ways by programmable controller. We use programmability explore different cache-coherence data-locality compute-server workloads. First, consider two protocols providing base cache-coherence, one with centralized directory information (dynamic pointer allocation) another distributed (SCI). While several commercial systems are based SCI, find scheme superior performance. Next, hardware software techniques some or all improve locality. Finally, propose hybrid combines techniques. These schemes same platform both user kernel references from thus offers realistic fair comparison replication/migration not previously been feasible.