作者: Vijay Karamcheti , Andrew A. Chien
关键词:
摘要: Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support in two machines --- TMC CM-5 and Cray T3D exploring design performance several implementations. The additional features remote operations: memory access, fetch-and-increment, atomic swaps, prefetch.Experiments show that requiring processor involvement message reception can increase communication overheads from 60% 300% moderate variations computation grain size at destination. In contrast, hardware operations decouples activity, producing high-performance independent or variability.In addition, shared address space used solve output contention problem (output hot spots), implementations robust over wide variety traffic patterns. Atomic swap build distributed queue, enabling "pull" scheme where destination requests data transfer upon receive. This uses prefetches mask receive latency. While this yields contention, its base cost is competitive only small messages (up 64 bytes) because high issuing resolving T3D. Emulation shows if interaction reduced factor eight (250ns 31ns), perhaps moving prefetch queue chip, there corresponding size, pull give superior all eases.