XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters

作者: Masahiro Nakao , Hitoshi Murai , Takenori Shimosaka , Akihiro Tabuchi , Toshihiro Hanawa

DOI: 10.1109/WACCPD.2014.6

关键词:

摘要: The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP OpenACC in order develop applications can use accelerator clusters with ease. Moreover, improve performance stencil applications, Omni compiler provides functions transfer halo region on memory via Tightly Coupled Accelerators (TCA), proprietary network for transferring data directly among accelerators. In paper, we evaluate productivity through implementations HIMENO Benchmark. results show thanks improvements, requires less than half source lines code compare combination Message Passing Interface (MPI) OpenACC, commonly used together as typical model. As result these using TCA achieved up 2.7 times faster could be obtained MPI GPUDirect RDMA over InfiniBand.

参考文章(19)
Xiaonan Tian, Rengan Xu, Yonghong Yan, Zhifeng Yun, Sunita Chandrasekaran, Barbara Chapman, Compiling a High-level Directive-Based Programming Model for GPGPUs languages and compilers for parallel computing. pp. 105- 120 ,(2013) , 10.1007/978-3-319-09967-5_6
Marc Snir, The MPI core MIT Press. ,(1998)
Albert Sidelnik, David Padua, Bradford L. Chamberlain, Maria J. Garzaran, Using the High Productivity Language Chapel to Target GPGPU Architectures hgpu.org. ,(2011)
David B. Loveman, Mary E. Zosel, Robert S. Schreiber, Charles H. Koelbel, Guy L. Steele, The High Performance Fortran Handbook ,(1993)
Ruymán Reyes, Iván López-Rodríguez, Juan J. Fumero, Francisco de Sande, accULL: an OpenACC implementation with CUDA and OpenCL support international conference on parallel processing. pp. 871- 882 ,(2012) , 10.1007/978-3-642-32820-6_86
Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, An extension of XcalableMP PGAS lanaguage for multi-node GPU clusters international conference on parallel processing. pp. 429- 439 ,(2011) , 10.1007/978-3-642-29737-3_48
Akihiro Tabuchi, Masahiro Nakao, Mitsuhisa Sato, A Source-to-Source OpenACC Compiler for CUDA european conference on parallel processing. pp. 178- 187 ,(2013) , 10.1007/978-3-642-54420-0_18
Dave Cunningham, Rajesh Bordawekar, Vijay Saraswat, GPU programming in a high level language Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11. pp. 8- ,(2011) , 10.1145/2212736.2212744
Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato, Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators ieee international symposium on parallel & distributed processing, workshops and phd forum. pp. 1030- 1039 ,(2013) , 10.1109/IPDPSW.2013.226
Jack Dongarra, Steven Huss-Lederman, David Walker, Steve Otto, Marc Snir, Marc Snir, MPI - The Complete Reference: Volume 1, The MPI Core ,(1998)