作者: Ali Cevahir , Akira Nukada , Satoshi Matsuoka
DOI: 10.1007/978-3-642-01970-8_90
关键词:
摘要: The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver unstructured problems, which runs on multiple GPUs installed single mainboard. achieves double precision accuracy with GPUs, using mixed iterative refinement algorithm. To achieve high computation speed, propose matrix-vector multiplication algorithm, core operation solvers. proposed algorithm efficiently utilizes GPU resources via caching, coalesced accesses and load balance between running threads. Experiments wide range matrices show that our up to 11.6 Gflops GeForce 8800 GTS card CG implementation 24.6 four GPUs.