作者: S. Potluri , A. Venkatesh , D. Bureddy , K. Kandalla , D. K. Panda
关键词: Computer science 、 Parallel computing 、 x86 、 Coprocessor 、 Performance per watt 、 InfiniBand 、 Xeon Phi 、 Shared memory 、 Supercomputer 、 Operating system 、 POSIX
摘要: Accelerators and coprocessors have become a key component in modern supercomputing systems due to the superior performance per watt that they offer. Intel's Xeon Phi coprocessor packs up 1 TFLOP of double precision single chip while providing x86 compatibility supporting popular programming models like MPI OpenMP. This makes it an attractive choice for accelerating HPC applications. The provides several channels communication between processes running on host. While POSIX shared memory within coprocessor, exposes low level API called Symmetric Communication Interface (SCIF) gives direct control DMA engine user. SCIF can also be used implementation InfiniBand (IB) Verbs interface enables link with adapter In this paper, we propose evaluate design alternatives efficient node coprocessor. We incorporate our designs MVAPICH2 library. use memory, IB hybrid solution improves latency from Host by 70%, 4MByte messages, compared out-of-the-box version MVAPICH2. Our delivers more than 6x improvement peak uni-directional bandwidth 3x bi-directional bandwidth. Through designs, are able improve 16 process Gather, Alltoall All gather collective operations 85% 80%, respectively, 4MB messages. further using application benchmarks show improvements 18% 3D Stencil kernel 11.5% P3DFFT