Comparative evaluation of latency reducing and tolerating techniques

作者: Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber

DOI: 10.1145/115952.115978

关键词: Computer scienceComparative evaluationMultiprocessingLatency (engineering)Distributed computing

摘要: Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques address problem: (i) hardware coherent caches, (ii) relaxed consistency, (iii) softwareconuolled prefetching, and (iv) multiple-context suppon. We some studies benefits individual have been done, no Study evaluates all within a consistent framework. This paper attempts to remedy by providing comprehensive evaluation techniques, both individually combinations, using set assumptions. The results obtained detailed simulations multiprocessor. Our show caches consistency UNformly improve performance. improvements due prefetching multiple contexts sizeable, but much more applicationdependent. Combinations various generally amin better performance than each one on its own. Overall, suitahle combinations be improved 4 7 dmes

参考文章(28)
Duncan H. Lawrie, Pen Chung Yew, Roland L. Lee, DATA PREFETCHING IN SHARED MEMORY MULTIPROCESSORS. international conference on parallel processing. pp. 28- 31 ,(1987)
Burton J. Smith, Architecture and applications of the HEP mulitprocessor computer system Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. ,vol. 298, pp. 342- 349 ,(2000) , 10.1117/12.932535
K. W. Kennedy, Allan Kennedy Porterfield, Software methods for improvement of cache performance on supercomputer applications Rice University. ,(1989)
Helen Davis, Stephen R. Goldschmidt, Tango introduction and tutorial Stanford University. ,(1990)
Duncan H. Lawrie, Pen-Chung Yew, Roland Lun Lee, The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors University of Illinois at Urbana-Champaign. ,(1987)
Ross A Overbeek, James Boyle, None, Portable Programs for Parallel Processors ,(1987)
Edward H. Gornish, Elana D. Granston, Alexander V. Veidenbaum, Compiler-directed data prefetching in multiprocessors with memory hierarchies international conference on supercomputing. ,vol. 18, pp. 354- 368 ,(1990) , 10.1145/2591635.2667162
R. Saavedra-Barrera, D. Culler, T. von Eicken, Analysis of multithreaded architectures for parallel computing acm symposium on parallel algorithms and architectures. pp. 169- 178 ,(1990) , 10.1145/97444.97683
L. Soule, A. Gupta, Parallel distributed-time logic simulation IEEE Design & Test of Computers. ,vol. 6, pp. 32- 48 ,(1989) , 10.1109/54.41672
Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs IEEE Transactions on Computers. ,vol. 28, pp. 690- 691 ,(1979) , 10.1109/TC.1979.1675439