作者: Mark Frederick Hoemmen , Ichitaro Yamazaki , Hartwig Anzt , Stanimire Tomov , Jack Dongarra
DOI:
关键词:
摘要: Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating point operations, communication is becoming increasingly expensive on modern computers. Since GPUs are now becoming a crucial component in computing, in this paper, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix-power kernel on the GPUs, we particularly focus on the orthogonalization strategies which have a great impact not only on the numerical stability of GMRES but also on its performance, especially as the coefficient matrix becomes sparser or more illconditioned. We present the experimental results on two six-cores Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained avoiding the communication both on a single GPU and between the GPUs. As a part of our studies, we investigate several optimization techniques for the GPU kernels that are also used in other sparse solvers beside GMRES. Hence, our studies not only demonstrate the importance of avoiding communication on the GPUs, but they also provide several insights about the effects of these optimization techniques on the performance of the sparse solvers.