High-Performance Communication Primitives and Data Structures on Message-Passing Manycores: Broadcast and Map

作者： Omid Shahmirzadi

DOI:

关键词:

摘要: The constant increase in single core frequency reached a plateau during recent years. This is due to physical phenomenon, known as power wall, where the produced heat inside chip so high that cannot be cooled down by existing technologies. An alternative harvest more computational per die fabricate number of cores into chip. Thereforemanycore chips with than thousand are expected end decade. These environments provide level parallel processing while their energy consumption considerably lower multi-chip counterparts. Although shared-memory programming classical paradigm program these environments, there numerous claims taking account full life cycle software, message-passing model have advantages. direct architectural consequence applying support message passing between entities directly hardware. Therefore manycore architectures hardware for becoming and visible. platforms can seen two ways: (i) High Performance Computing (HPC) cluster programmed highly trained scientists using Message Passing Interface (MPI) libraries; or (ii) mainstream computing platform requiring global operating system abstract away complexities from ordinary programmer. In first view, performance communication primitives an important bottleneck MPI applications. second kernel data structures been shown limiting factor. this thesis we overview state-of-the-art techniques circumvent mentioned bottlenecks; study high-performance broadcast primitive andmap structure onmodern architectures, hardware, different chapters respectively. one chapter, how make use features implement efficient primitive. We consider Intel Single-chip Cloud Computer (SCC) our target platformwhich offers ability tomove on-chipMessage Buffers (MPB) Remote Memory Access (RMA). propose OC-Bcast (On-Chip Broadcast), pipelined k-ary tree algorithm tailored exploit parallelism provided on-chip RMA. Experimental results show attains better terms latency throughput compared solutions. improvement highlights benefits exploiting platform: Our takes advantage RMA, unlike other solutions which based on higher-level send/receive interface. implementation high-throughput concurrent maps

epfl.ch 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(93)

Isaías A. Comprés Ureña, Michael Riepen, Michael Konow, RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC) EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface. pp. 208- 217 ,(2011) , 10.1007/978-3-642-24449-0_24

Kevin Klues, Krste Asanović, John Kubiatowicz, Steven Hofmeyr, Sarah Bird, Rose Liu, Tessellation: space-time partitioning in a manycore client OS usenix conference on hot topics in parallelism. pp. 10- 10 ,(2009)

R. Rabenseifner, Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 ,(2004)

Craig E. Rasmussen, Timothy G. Mattson, Matthew Sottile, Introduction to Concurrency in Programming Languages ,(2009)

H Chen, Rong Chen, Yandong Mao, F Kaashoek, R Morris, A Pesterev, L Stein, M Wu, Y Dai, Y Zhang, Z Zhang, None, Corey: an operating system for many cores operating systems design and implementation. pp. 43- 57 ,(2008) , 10.5555/1855741.1855745

Robert A. van de Geijn, Mohak Shroff, CollMark: MPI Collective Communication Benchmark ,(2000)

Omid Shahmirzadi, Thomas Ropars, André Schiper, Darko Petrovic, Asynchronous Broadcast on the Intel SCC using Interrupts The 6th Many-core Applications Research Community (MARC) Symposium. pp. 24- 29 ,(2012)

Paul E. Mckenney, John D. Slingwine, READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS ,(2002)

Emmanuel Jeannot, Guillaume Mercier, Near-optimal placement of MPI processes on hierarchical NUMA architectures european conference on parallel processing. pp. 199- 210 ,(2010) , 10.1007/978-3-642-15291-7_20

10.

M.E. Verstraaten, C.U. Grelck, M.W. van Tol, R. Bakker, C.R. Jesshope, Efficient memory copy operations on the 48-core Intel SCC processor MARC Symposium. pp. 13- 18 ,(2011)

High-Performance Communication Primitives and Data Structures on Message-Passing Manycores: Broadcast and Map

来源期刊

我的账户

High-Performance Communication Primitives and Data Structures on Message-Passing Manycores: Broadcast and Map

来源期刊

相似文章 0

我的账户