Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

作者: Todd Mowry , Anoop Gupta

DOI: 10.1016/0743-7315(91)90014-Z

关键词: Consistency modelComputer scienceCacheShared memoryData accessParallel computingLinked listInstruction prefetchMultiprocessing

摘要: Abstract The large latency of memory accesses is a major obstacle in obtaining high processor utilization large-scale shared-memory multiprocessors. Although the provision coherent caches many recent machines has alleviated problem somewhat, cache misses still occur frequently enough that they significantly lower performance. In this paper we evaluate effectiveness nonbinding software-controlled prefetching , as proposed Stanford DASH multiprocessor, to address problem. prefetches are sense prefetched data brought close processor, but available cache-coherence protocol keep it consistent. Prefetching since program must explicitly issue prefetch instructions. presents results from detailed simulation studies done context multiprocessor. Our show for applications with regular access patterns—we particle-based simulator used aeronautics and an LU-decomposition application—prefetching can be very effective. It was easy augment do their performance increased by 100–150% when directly into processor's cache. However, complex usage patterns, less successful. After much effort, distributed-time logic application made extensive use pointers linked lists could only 30%. also evaluates effects various hardware optimizations such separate buffers, exclusive ownership, lockup-free caches, weaker consistency models on prefetching.

参考文章(24)
Duncan H. Lawrie, Pen Chung Yew, Roland L. Lee, DATA PREFETCHING IN SHARED MEMORY MULTIPROCESSORS. international conference on parallel processing. pp. 28- 31 ,(1987)
K. W. Kennedy, Allan Kennedy Porterfield, Software methods for improvement of cache performance on supercomputer applications Rice University. ,(1989)
Christoph Scheurich, Michel Dubois, Concurrent Miss Resolution in Multiprocessor Caches. international conference on parallel processing. pp. 118- 125 ,(1988)
Helen Davis, Stephen R. Goldschmidt, Tango introduction and tutorial Stanford University. ,(1990)
Duncan H. Lawrie, Pen-Chung Yew, Roland Lun Lee, The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors University of Illinois at Urbana-Champaign. ,(1987)
Ross A Overbeek, James Boyle, None, Portable Programs for Parallel Processors ,(1987)
W.E Nagel, 1988 International conference on supercomputing parallel computing. ,vol. 9, pp. 117- 118 ,(1988) , 10.1016/0167-8191(88)90021-X
R. L. Lee, P. C. Yew, D. H. Lawrie, Multiprocessor cache design considerations international symposium on computer architecture. pp. 253- 262 ,(1987) , 10.1145/30350.30379
Edward H. Gornish, Elana D. Granston, Alexander V. Veidenbaum, Compiler-directed data prefetching in multiprocessors with memory hierarchies international conference on supercomputing. ,vol. 18, pp. 354- 368 ,(1990) , 10.1145/2591635.2667162
L. Soule, A. Gupta, Parallel distributed-time logic simulation IEEE Design & Test of Computers. ,vol. 6, pp. 32- 48 ,(1989) , 10.1109/54.41672