Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

作者: Ye Zhang , L. Rauchwerger , J. Torrellas

DOI: 10.1109/HPCA.1998.650556

关键词:

摘要: Run-time parallelization is often the only way to execute code in parallel when data dependence information incomplete at compile time. This situation common many important applications. Unfortunately, known techniques for run-time are computationally expensive or not general enough. To address this problem, we propose new hardware support efficient distributed shared-memory (DSM) multiprocessors. The idea speculatively and use extensions cache coherence protocol detect any violations. As soon as a detected, execution stops, state restored, re-executed serially. scheme, which apply loops, allows iterations complete potentially order. scheme requires memory hierarchy of DSM. It has low overhead. We present algorithms design scheme. Overall, delivers average loop speedups 7.3 16 processors 50% faster than related software-only method.

参考文章(10)
Seema Hiranandani, Janet Wu, Joel H. Saltz, Harry Berryman, Runtime Compilation Methods for Multicomputers. international conference on parallel processing. pp. 26- 30 ,(1991)
S. Gopal, T.N. Vijaykumar, J.E. Smith, G.S. Sohi, Speculative versioning cache high-performance computer architecture. pp. 195- 205 ,(1998) , 10.1109/HPCA.1998.650559
Blu William, Ramon Doallo, Rudolf Eigenmann, John Grout, Jay Hoeflinger, Thomas Lawrence, Jaejin Lee, David Padua, Yunheung Paek, Bill Pottenger, Lawrence Rauchwerger, Peng Tu, Parallel programming with Polaris IEEE Computer. ,vol. 29, pp. 78- 82 ,(1996) , 10.1109/2.546612
Lawrence Rauchwerger, David Padua, The LRPD test Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation - PLDI '95. ,vol. 30, pp. 218- 232 ,(1995) , 10.1145/207110.207148
Shun-Tak Leung, John Zahorjan, Improving the performance of runtime parallelization Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93. ,vol. 28, pp. 83- 91 ,(1993) , 10.1145/155332.155341
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, J. Martin, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers ieee international conference on high performance computing data and analytics. ,vol. 3, pp. 5- 40 ,(1989) , 10.1177/109434208900300302
Josep Torrellas, Pen Chung Yew, Ding Kai Chen, An efficient algorithm for the run-time parallelization of DOACROSS loops conference on high performance computing (supercomputing). pp. 518- 527 ,(1994) , 10.5555/602770.602857
K.D. Cooper, M.W. Hall, R.T. Hood, K. Kennedy, K.S. McKinley, J.M. Mellor-Crummey, L. Torczon, S.K. Warren, The ParaScope parallel programming environment Proceedings of the IEEE. ,vol. 81, pp. 244- 263 ,(1993) , 10.1109/5.214549
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor ACM SIGARCH Computer Architecture News. ,vol. 18, pp. 148- 159 ,(1990) , 10.1145/325096.325132
M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, Shih-Wei Liao, E. Bugnion, M.S Lam, Maximizing multiprocessor performance with the SUIF compiler IEEE Computer. ,vol. 29, pp. 84- 89 ,(1996) , 10.1109/2.546613