作者: AARON WALDEN , ERIC J NIELSEN , MOHAMMAD ZUBAIR , JOHN C LINFORD , JUSTIN P LUITJENS
DOI:
关键词:
摘要: We explore the transition of a legacy, MPI-only, domain-decomposed unstructuredgrid code highly optimized for multi-core systems to shared-memory MPI+ OpenMP and MPI+ CUDA models more suitable for a future high-performance computing landscape dominated by heterogeneous many-core architectures. We study node-level performance characteristics of compute-intensive kernels hand-optimized using CUDA and AVX512 vector intrinsics. Strong scaling results are presented which contrast the scalability of the original MPI-only model with that of the hybrid models.