作者: Tom Cornebize , Arnaud Legrand , Franz C. Heinrich
DOI: 10.1109/CLUSTER.2019.8891011
关键词:
摘要: Finely tuning MPI applications (number of processes, granularity, collective operation algorithms, topology and process placement) is critical to obtain good performance on supercomputers. With a rising cost modern supercomputers, running parallel at scale solely optimize their extremely expensive. Having inexpensive but faithful predictions expected could be great help for researchers system administrators. The methodology we propose captures the complexity adaptive by emulating code while skipping insignificant parts. We demonstrate its capability with High Performance Linpack (HPL), benchmark used rank supercomputers in TOP500 which requires careful tuning. explain (1) how both extended SimGrid’s SMPI simulator slightly modified open-source version HPL allow fast emulation single commodity server supercomputer (2) model different components (network, BLAS, …) system. show that modeling spatial temporal node variability allows us within few percents real experiments (see Figure 1).