Improving the memory-system performance of sparse-matrix vector multiplication

作者: S. Toledo

DOI: 10.1147/RD.416.0711

关键词:

摘要: … the code performs by splitting the general sparse matrix A into a sum of two or three … In each experiment we measured the running time of the matrix vector multiplication code, the …

参考文章(19)
K. W. Kennedy, Allan Kennedy Porterfield, Software methods for improvement of cache performance on supercomputer applications Rice University. ,(1989)
Duncan H. Lawrie, Pen-Chung Yew, Roland Lun Lee, The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors University of Illinois at Urbana-Champaign. ,(1987)
M. Tremblay, J.M. O'Connor, UltraSparc I: a four-issue processor supporting multimedia IEEE Micro. ,vol. 16, pp. 42- 50 ,(1996) , 10.1109/40.491461
A. Gupta, Fast and effective algorithms for graph partitioning and sparse-matrix ordering Ibm Journal of Research and Development. ,vol. 41, pp. 171- 183 ,(1997) , 10.1147/RD.411.0171
C. L. Lawson, R. J. Hanson, D. R. Kincaid, F. T. Krogh, Basic Linear Algebra Subprograms for Fortran Usage ACM Transactions on Mathematical Software. ,vol. 5, pp. 308- 323 ,(1979) , 10.1145/355841.355847
E. H. Welbon, C. C. Chan-Nui, D. J. Shippy, D. A. Hicks, The POWER2 performance monitor IBM Journal of Research and Development. ,vol. 38, pp. 545- 554 ,(1994) , 10.1147/RD.385.0545
Iain S. Duff, Gérard A. Meurant, The effect of ordering on preconditioned conjugate gradients BIT. ,vol. 29, pp. 635- 657 ,(1989) , 10.1007/BF01932738
R. C. Agarwal, B. Alpern, L. Carter, F. G. Gustavson, D. J. Klepacki, R. Lawrence, M. Zubair, High-performance parallel implementations of the NAS kernel benchmarks on the IBM SP2 Ibm Systems Journal. ,vol. 34, pp. 263- 272 ,(1995) , 10.1147/SJ.342.0263
S. W. White, S. Dhawan, POWER2: Next generation of the RISC System/6000 family IBM Journal of Research and Development. ,vol. 38, pp. 493- 502 ,(1994) , 10.1147/RD.385.0493