作者: Yan-Tyng Sherry Chang , John Bauer , Henry Jin
关键词: Supercomputer 、 Lustre (file system) 、 Internet of Things 、 Speedup 、 Parallel computing 、 Development environment 、 Heavy load 、 Server 、 Computer science
摘要: Combining the strengths of MPIProf and IOT, an efficient systematic method is devised for I/O characterization at per-job, per-rank, per-file per-call levels programs running on high-performance computing resources NASA Advanced Supercomputing (NAS) facility. This applied to four questions in this paper. A total 13 MPI 15 cases, ranging from 24 5968 ranks, are analyzed establish landscape answers questions. Four use I/O, behavior their collective writes depends specific implementation library used. The SGI MPT library, prevailing NAS systems, was found automatically gather small a large number ranks order perform larger by subset buffering ranks. invoked Lustre stripe count nodes used run. demonstration varying achieve double-digit speedup one program's presented. Another program, which concurrently opens private files all could potentially create heavy load servers, identified. ability systematically characterize supercomputer, seek optimization opportunity, identify that cause high instability filesystems important pursuing exascale real production environment.