Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

作者： Ye Zhang , L. Rauchwerger , J. Torrellas

DOI: 10.1109/HPCA.1998.650556

关键词:

摘要: Run-time parallelization is often the only way to execute code in parallel when data dependence information incomplete at compile time. This situation common many important applications. Unfortunately, known techniques for run-time are computationally expensive or not general enough. To address this problem, we propose new hardware support efficient distributed shared-memory (DSM) multiprocessors. The idea speculatively and use extensions cache coherence protocol detect any violations. As soon as a detected, execution stops, state restored, re-executed serially. scheme, which apply loops, allows iterations complete potentially order. scheme requires memory hierarchy of DSM. It has low overhead. We present algorithms design scheme. Overall, delivers average loop speedups 7.3 16 processors 50% faster than related software-only method.

uni-trier.de 本地加速

illinois.edu 本地加速

sci-hub.se PDF 下载加速

参考文章(10)

Seema Hiranandani, Janet Wu, Joel H. Saltz, Harry Berryman, Runtime Compilation Methods for Multicomputers. international conference on parallel processing. pp. 26- 30 ,(1991)

S. Gopal, T.N. Vijaykumar, J.E. Smith, G.S. Sohi, Speculative versioning cache high-performance computer architecture. pp. 195- 205 ,(1998) , 10.1109/HPCA.1998.650559

Blu William, Ramon Doallo, Rudolf Eigenmann, John Grout, Jay Hoeflinger, Thomas Lawrence, Jaejin Lee, David Padua, Yunheung Paek, Bill Pottenger, Lawrence Rauchwerger, Peng Tu, Parallel programming with Polaris IEEE Computer. ,vol. 29, pp. 78- 82 ,(1996) , 10.1109/2.546612

Lawrence Rauchwerger, David Padua, The LRPD test Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation - PLDI '95. ,vol. 30, pp. 218- 232 ,(1995) , 10.1145/207110.207148

Shun-Tak Leung, John Zahorjan, Improving the performance of runtime parallelization Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93. ,vol. 28, pp. 83- 91 ,(1993) , 10.1145/155332.155341

M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, J. Martin, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers ieee international conference on high performance computing data and analytics. ,vol. 3, pp. 5- 40 ,(1989) , 10.1177/109434208900300302

Josep Torrellas, Pen Chung Yew, Ding Kai Chen, An efficient algorithm for the run-time parallelization of DOACROSS loops conference on high performance computing (supercomputing). pp. 518- 527 ,(1994) , 10.5555/602770.602857

K.D. Cooper, M.W. Hall, R.T. Hood, K. Kennedy, K.S. McKinley, J.M. Mellor-Crummey, L. Torczon, S.K. Warren, The ParaScope parallel programming environment Proceedings of the IEEE. ,vol. 81, pp. 244- 263 ,(1993) , 10.1109/5.214549

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor ACM SIGARCH Computer Architecture News. ,vol. 18, pp. 148- 159 ,(1990) , 10.1145/325096.325132

10.

M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, Shih-Wei Liao, E. Bugnion, M.S Lam, Maximizing multiprocessor performance with the SUIF compiler IEEE Computer. ,vol. 29, pp. 84- 89 ,(1996) , 10.1109/2.546613

Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

来源期刊

我的账户

Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

来源期刊

相似文章 10

我的账户