作者: Ye Zhang , L. Rauchwerger , J. Torrellas
关键词:
摘要: Run-time parallelization is often the only way to execute code in parallel when data dependence information incomplete at compile time. This situation common many important applications. Unfortunately, known techniques for run-time are computationally expensive or not general enough. To address this problem, we propose new hardware support efficient distributed shared-memory (DSM) multiprocessors. The idea speculatively and use extensions cache coherence protocol detect any violations. As soon as a detected, execution stops, state restored, re-executed serially. scheme, which apply loops, allows iterations complete potentially order. scheme requires memory hierarchy of DSM. It has low overhead. We present algorithms design scheme. Overall, delivers average loop speedups 7.3 16 processors 50% faster than related software-only method.