A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

作者: Vijay Karamcheti , Andrew A. Chien

DOI: 10.1145/223982.224440

关键词:

摘要: Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support in two machines --- TMC CM-5 and Cray T3D exploring design performance several implementations. The additional features remote operations: memory access, fetch-and-increment, atomic swaps, prefetch.Experiments show that requiring processor involvement message reception can increase communication overheads from 60% 300% moderate variations computation grain size at destination. In contrast, hardware operations decouples activity, producing high-performance independent or variability.In addition, shared address space used solve output contention problem (output hot spots), implementations robust over wide variety traffic patterns. Atomic swap build distributed queue, enabling "pull" scheme where destination requests data transfer upon receive. This uses prefetches mask receive latency. While this yields contention, its base cost is competitive only small messages (up 64 bytes) because high issuing resolving T3D. Emulation shows if interaction reduced factor eight (250ns 31ns), perhaps moving prefetch queue chip, there corresponding size, pull give superior all eases.

参考文章(20)
John Plevyak, Vijay Karamcheti, Andrew A. Chien, The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs University of Illinois at Urbana-Champaign. ,(1993)
G.A. Geist, V.S. Sunderam, The PVM System: Supercomputer Level Concurrent Computation on a Heterogeneous Network of Workstations distributed memory computing conference. pp. 258- 261 ,(1991) , 10.1109/DMCC.1991.633139
E.A. Brewer, B.C. Kuszmaul, How to get good performance from the CM-5 data network international parallel processing symposium. pp. 858- 867 ,(1994) , 10.1109/IPPS.1994.288205
Robert M. Metcalfe, David R. Boggs, Ethernet Communications of the ACM. ,vol. 26, pp. 90- 95 ,(1983) , 10.1145/357980.358015
Roger W Hockney, Edward A Carmona, Comparison of communications on the Intel iPSC/860 and Touchstone Delta parallel computing. ,vol. 18, pp. 1067- 1072 ,(1992) , 10.1016/0167-8191(92)90018-3
A. Choudhary, G. Fox, R. Thakur, R. Ponnusamy, Scheduling regular and irregular communication patterns on the CM-5 conference on high performance computing (supercomputing). pp. 394- 402 ,(1992) , 10.5555/147877.148034
Chandramohan A. Thekkath, Henry M. Levy, Edward D. Lazowska, Separating data and control transfer in distributed operating systems architectural support for programming languages and operating systems. ,vol. 29, pp. 2- 11 ,(1994) , 10.1145/195473.195481
David E. Culler, Anurag Sah, Klaus E. Schauser, Thorsten von Eicken, John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine Proceedings of the fourth international conference on Architectural support for programming languages and operating systems - ASPLOS-IV. ,vol. 19, pp. 164- 175 ,(1991) , 10.1145/106972.106990
K. Mani Chandy, C. Kesselman, Compositional C++: Compositional Parallel Programming languages and compilers for parallel computing. pp. 124- 144 ,(1992) , 10.1007/3-540-57502-2_44