Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity

作者: Simon J. Pennycook , Jason D. Sewall , Jeff R. Hammond

DOI: 10.1109/P3HPC.2018.00007

关键词: Software portabilitySuiteCode (cryptography)PortingInstruction setComputer scienceProductivityComputer architectureBenchmark (computing)Specialization (functional)

摘要: We investigate how specialization mechanisms proposed for OpenMP 5.0 -- specifically, the metadirective and declare variant directives may be deployed in a real-life code, using miniMD benchmark from Mantevo suite. Additionally, we develop an 4.5 implementation of that achieves performance portability 59.35% across contemporary CPU GPU hardware, discuss processes porting enabling this show use would enable our code to expressed significantly more compact form, with implications productivity.

参考文章(32)
Amit Sabne, Putt Sakdhnagool, Seyong Lee, Jeffrey S. Vetter, Evaluating Performance Portability of OpenACC Languages and Compilers for Parallel Computing. pp. 51- 66 ,(2015) , 10.1007/978-3-319-17473-0_4
Sandra Wienke, Paul Springer, Christian Terboven, Dieter an Mey, OpenACC: first experiences with real-world applications international conference on parallel processing. pp. 859- 870 ,(2012) , 10.1007/978-3-642-32820-6_85
Performance Analysis and Optimization Wiley-IEEE Press. pp. 351- 396 ,(2004) , 10.1002/0471648299.CH7
S. J. Pennycook, S. A. Jarvis, Developing Performance-Portable Molecular Dynamics Kernels in OpenCL ieee international conference on high performance computing data and analytics. pp. 386- 395 ,(2012) , 10.1109/SC.COMPANION.2012.58
Ray Grout, Ramanan Sankaran, John M. Levesque, Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond ieee international conference on high performance computing data and analytics. pp. 1- 11 ,(2012) , 10.5555/2388996.2389017
T. Hoshino, N. Maruyama, S. Matsuoka, R. Takaki, CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application ieee acm international symposium cluster cloud and grid computing. pp. 136- 143 ,(2013) , 10.1109/CCGRID.2013.12
Carlo Bertolli, Samuel F. Antao, Alexandre E. Eichenberger, Kevin OBrien Zehra Sura, Arpith C. Jacob, Tong Chen, Olivier Sallenave, Coordinating GPU threads for OpenMP 4.0 in LLVM Proceedings of the 2014 LLVM Compiler Infrastructure in HPC. pp. 12- 21 ,(2014) , 10.1109/LLVM-HPC.2014.10
Steve Plimpton, Fast parallel algorithms for short-range molecular dynamics Journal of Computational Physics. ,vol. 117, pp. 1- 19 ,(1995) , 10.1006/JCPH.1995.1039
J. A. Herdman, W. P. Gaudin, S. McIntosh-Smith, M. Boulton, D. A. Beckingsale, A. C. Mallinson, S. A. Jarvis, Accelerating Hydrocodes with OpenACC, OpeCL and CUDA ieee international conference on high performance computing data and analytics. pp. 465- 471 ,(2012) , 10.1109/SC.COMPANION.2012.66
H. Carter Edwards, Christian R. Trott, Daniel Sunderland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Journal of Parallel and Distributed Computing. ,vol. 74, pp. 3202- 3216 ,(2014) , 10.1016/J.JPDC.2014.07.003