Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

作者: Richard W. Vuduc , Hyun-Jin Moon

DOI: 10.1007/11557654_91

关键词:

摘要: We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when matrix structure consists multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. split matrix, A, into a sum, A1 + A2 ... As, where each term is stored in new data we refer to as unaligned block compressed row (UBCSR) format. A classical approach which stores BCSR can also reduce execution time, but improvements may be limited because imposes an alignment non-zeros that leads extra work filled-in zeros. Combining splitting with UBCSR reduces while retaining generally lower memory bandwidth requirements and register-level tiling opportunities BCSR. show speedups high 2.1× over no blocking, 1.8× used prior set application matrices. Even does not significantly, usually storage.

参考文章(18)
Roldan Pozo, Karin A Remington, NIST sparse BLAS user's guide National Institute of Standards and Technology. ,(2001) , 10.6028/NIST.IR.6744
Virginia Vassilevska, Ali Pinar, Finding Nonoverlapping Dense Blocks of a Sparse Matrix ,(2004)
Eun Im, Optimizing the Performance of Sparse Matrix-Vector Multiplication University of California, Berkeley. ,(2000)
S. Toledo, Improving the memory-system performance of sparse-matrix vector multiplication Ibm Journal of Research and Development. ,vol. 41, pp. 711- 726 ,(1997) , 10.1147/RD.416.0711
ROMAN GEUS, STEFAN RÖLLIN, Towards a fast parallel sparse matrix-vector multiplication. parallel computing. pp. 308- 315 ,(2000) , 10.1142/9781848160170_0036
A. H. Baker, E. R. Jessup, T. Manteuffel, A Technique for Accelerating the Convergence of Restarted GMRES SIAM Journal on Matrix Analysis and Applications. ,vol. 26, pp. 962- 984 ,(2005) , 10.1137/S0895479803422014
Rajesh Nishtala, Richard Vuduc, James W. Demmel, Shoaib Kamil, Katherine A. Yelick, Benjamin Lee, Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply conference on high performance computing (supercomputing). pp. 1- 35 ,(2002) , 10.5555/762761.762822
Ali Pinar, Michael T. Heath, Improving Performance of Sparse Matrix-Vector Multiplication conference on high performance computing (supercomputing). pp. 30- 30 ,(1999) , 10.1145/331532.331562
Eun-Jin Im, Katherine Yelick, Richard Vuduc, Sparsity: Optimization Framework for Sparse Matrix Kernels ieee international conference on high performance computing data and analytics. ,vol. 18, pp. 135- 158 ,(2004) , 10.1177/1094342004041296
Eduardo F. D’Azevedo, Mark R. Fahey, Richard T. Mills, Vectorized Sparse Matrix Multiply for Compressed Row Storage Format Lecture Notes in Computer Science. pp. 99- 106 ,(2005) , 10.1007/11428831_13