作者: Gábor Dózsa , Sameer Kumar , Pavan Balaji , Darius Buntinas , David Goodell
DOI: 10.1007/978-3-642-15646-5_2
关键词: Task (computing) 、 Multi-core processor 、 Message queue 、 Network interface 、 Node (networking) 、 Parallel computing 、 Shared memory 、 Critical section 、 Petascale computing 、 Computer science
摘要: With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit shared memory within a node, combined with MPI across nodes. Achieving high performance when large number concurrent make calls is challenging task for an implementation. We describe design and implementation our solution in MPICH2 achieve high-performance multithreaded communication IBM Blue Gene/P. use combination multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, specially designed queues provide degree access while still maintaining MPI's message-ordering semantics. present results that demonstrate new improves message rate by factor 3.6 compared existing BG/P. Our solutions also applicable other high-end systems have parallel capabilities.