Architectural support for compiler-generated data-parallel program

作者: Alexander C. Klaiber

DOI:

关键词: Shared memoryParallel computingOverhead (computing)Operating systemMessage passingSynchronization (computer science)Parallel languageCompilerProgrammerNetwork interfaceComputer science

摘要: To fully realize the advantages of parallel processing demands design efficient communication mechanisms. Existing architectures span a spectrum ranging from message passing to remote-memory access, shared memory and cache-only architectures. These are often used (and designed be used) directly by programmer. However, in future we can expect more programs written high-level languages compiled specific target; compiler will hide details underlying hardware The architecture should then with compiler, not programmer, mind. The goal our work is improve performance for that language architecture. make this task manageable, focus on class data-parallel pick C$\sp*$ as one representative experiments. We evaluate three competing architectures--message-passing, access cache-coherent shared-memory--for set benchmarks respective We show message-passing model has several inherent these benchmarks, resulting less interconnect traffic time spent waiting messages traverse interconnect. On other hand, requires CPU perform significantly per than This overhead destroys much model's advantage. We propose language-oriented retains model, yet (in cooperation compiler) reduces overhead. do so, first identify small low-level synchronization primitives well matched needs C$\sp*$. network interface tuned describe compilation base; includes remote read/write requests plus counter-based support. simulate measure traditional design; measurements demonstrate effective at reducing communication-related

参考文章(51)
J. Goodhue, T. Blackadar, William R. Crowther, Walter Milliken, E. Starr, Robert H. Thomas, Performance Measurements on a 128-Node Butterfly Parallel Processor. international conference on parallel processing. pp. 531- 540 ,(1985)
University of Washington. Department of Computer Science, Efficient Support for Multicomputing on ATM Networks ,(1993)
Tor E. Jeremiassen, Susan J. Eggers, Eliminating False Sharing. international conference on parallel processing. pp. 377- 381 ,(1991)
Seema Hiranandani, Janet Wu, Joel H. Saltz, Harry Berryman, Runtime Compilation Methods for Multicomputers. international conference on parallel processing. pp. 26- 30 ,(1991)
Anant Agarwal, David Chaiken, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Beng-Hong Lim, Gino Maa, Dan Nussbaum, THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR Springer, Boston, MA. pp. 239- 261 ,(1991) , 10.1007/978-1-4615-3604-8_13
Margaret Martonosi, Anoop Gupta, Tradeoffs in Message Passing and Shared Memory Implementations of a Standard Cell Router. international conference on parallel processing. pp. 88- 96 ,(1989)
William J. Dally, The J-machine system Artificial intelligence at MIT expanding frontiers. pp. 548- 581 ,(1991)
Edward William Felten, Protocol compilation: high-performance communication for parallel programs University of Washington. ,(1993)
Lawrence Snyder, Calvin Lin, A Comparison of Programming Models for Shared Memory Multiprocessors. international conference on parallel processing. pp. 163- 170 ,(1990)