作者: Todd Mowry , Anoop Gupta
DOI: 10.1016/0743-7315(91)90014-Z
关键词: Consistency model 、 Computer science 、 Cache 、 Shared memory 、 Data access 、 Parallel computing 、 Linked list 、 Instruction prefetch 、 Multiprocessing
摘要: Abstract The large latency of memory accesses is a major obstacle in obtaining high processor utilization large-scale shared-memory multiprocessors. Although the provision coherent caches many recent machines has alleviated problem somewhat, cache misses still occur frequently enough that they significantly lower performance. In this paper we evaluate effectiveness nonbinding software-controlled prefetching , as proposed Stanford DASH multiprocessor, to address problem. prefetches are sense prefetched data brought close processor, but available cache-coherence protocol keep it consistent. Prefetching since program must explicitly issue prefetch instructions. presents results from detailed simulation studies done context multiprocessor. Our show for applications with regular access patterns—we particle-based simulator used aeronautics and an LU-decomposition application—prefetching can be very effective. It was easy augment do their performance increased by 100–150% when directly into processor's cache. However, complex usage patterns, less successful. After much effort, distributed-time logic application made extensive use pointers linked lists could only 30%. also evaluates effects various hardware optimizations such separate buffers, exclusive ownership, lockup-free caches, weaker consistency models on prefetching.