High Throughput Intra-Node MPI Communication with Open-MX

作者: Brice Goglin

DOI: 10.1109/PDP.2009.20

关键词: Message passingNode (networking)Data transmissionComputer scienceMemory managementSupercomputerComputer networkCacheStack (abstract data type)Throughput

摘要: The increasing number of cores per node in high-performance computing requires an efficient intra-node MPI communication subsystem. Most existing implementations rely on two copies across a shared memory-mapped file.Open-MX offers single-copy mechanism that is tightly integrated its regular stack, making it transparently available to the MX backend many layers. We describe this implementation and offloaded copy using I/OAT hardware. Memory pinning requirements are then discussed, overlapped introduced enable start Open-MX data transfer earlier.Performance evaluation shows local stack performs better than MPICH2 Open-MPI for large messages, reaching up 70% throughput micro-benchmarks when offload. Thanks being involved, also does not heavily depend cache sharing between processing cores, these performance improvements easier observe real applications.

参考文章(11)
Darius Buntinas, Guillaume Mercier, William Gropp, Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem Recent Advances in Parallel Virtual Machine and Message Passing Interface. ,vol. 4192, pp. 86- 95 ,(2006) , 10.1007/11846802_19
Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, Timothy S. Woodall, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation Lecture Notes in Computer Science. pp. 97- 104 ,(2004) , 10.1007/978-3-540-30218-6_19
Lei Chai, Albert Hartono, Dhabaleswar Panda, Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters international conference on cluster computing. pp. 1- 10 ,(2006) , 10.1109/CLUSTR.2006.311850
D.H. Bailey, E. Barszcz, J.T. Barton, D.S. Browning, R.L. Carter, L. Dagum, R.A. Fatoohi, P.O. Frederickson, T.A. Lasinski, R.S. Schreiber, H.D. Simon, V. Venkatakrishnan, S.K. Weeratunga, The Nas Parallel Benchmarks ieee international conference on high performance computing data and analytics. ,vol. 5, pp. 63- 73 ,(1991) , 10.1177/109434209100500306
K. Vaidyanathan, W. Huang, L. Chai, D. K. Panda, Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT international parallel and distributed processing symposium. pp. 1- 8 ,(2007) , 10.1109/IPDPS.2007.370479
B. Goglin, L. Prylli, O. Gluck, Optimizations of client's side communications in a distributed file system within a Myrinet cluster local computer networks. pp. 726- 733 ,(2004) , 10.1109/LCN.2004.92
B. Goglin, Improving message passing over Ethernet with I/OAT copy offload in Open-MX international conference on cluster computing. pp. 223- 231 ,(2008) , 10.1109/CLUSTR.2008.4663775
Brice Goglin, Design and implementation of Open-MX: High-performance message passing over generic Ethernet hardware international parallel and distributed processing symposium. pp. 1- 7 ,(2008) , 10.1109/IPDPS.2008.4536140
K. Vaidyanathan, L. Chai, W. Huang, D. K. Panda, Efficient asynchronous memory copy operations on multi-core systems and I/OAT international conference on cluster computing. pp. 159- 168 ,(2007) , 10.1109/CLUSTR.2007.4629228
Franck Cappello, Daniel Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks conference on high performance computing (supercomputing). pp. 12- 12 ,(2000) , 10.5555/370049.370071