Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply

作者： Rajesh Nishtala , Richard Vuduc , James W. Demmel , Shoaib Kamil , Katherine A. Yelick

关键词:

摘要: We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM × V), one the most important computational kernels in scientific applications. This paper addresses fundamental questions what limits exist on such how closely tuned approaches these limits.Specifically, we develop upper lower bounds (Mflop/s) SpM V when using our previously proposed register blocking optimization. These are based non-zero pattern matrix cost basic memory operations, as cache hits misses. evaluate implementations with respect to hardware counter 4 different platforms a test set 44 matrices. find that can often get within 20% bound, particularly class matrices from finite element modeling (FEM) problems; non-FEM matrices, improvements 2× still possible. Lastly, present new heuristic selects optimal or near-optimal block sizes (the key tuning parameters) more accurately than previous heuristic. Using heuristic, show much 2.5× over an untuned implementation.Collectively, results suggest future improvements, beyond those have already demonstrated for V, will come two sources: (1) consideration higher-level structures (e.g., exploiting symmetry, reordering, multiple sizes), (2) optimizing opportunity reuse matrix-multiple vector multiply, multiplication ATA vector).

berkeley.edu PDF 下载加速

berkeley.edu LINK 下载加速

desy.de PDF 下载加速

microsoft.com PDF 下载加速

uni-trier.de PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(39)

Roldan Pozo, Karin A Remington, NIST sparse BLAS user's guide National Institute of Standards and Technology. ,(2001) , 10.6028/NIST.IR.6744

Dora Blanco Heras, Vicente Blanco Pérez, José Carlos Cabaleiro Domínguez, Francisco Fernández Rivera, Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study ieee international conference on high performance computing data and analytics. pp. 201- 210 ,(1999) , 10.1007/BFB0100581

Paul Vinson Stodghill, A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. ,(1997)

Eun-Jin Im, Katherine Yelick, Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY international conference on computational science. pp. 127- 136 ,(2001) , 10.1007/3-540-45545-0_22

Markus Püschel, Bryan Singer, Manuela Veloso, José M. F. Moura, Fast Automatic Generation of DSP Algorithms international conference on computational science. pp. 97- 106 ,(2001) , 10.1007/3-540-45545-0_19

William Pugh, Tatiana Shpeisman, SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations Languages and Compilers for Parallel Computing. pp. 213- 229 ,(1999) , 10.1007/3-540-48319-5_14

Richard Barrett, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods ,(1987)

Todd L. Veldhuizen, Arrays in Blitz Lecture Notes in Computer Science. pp. 223- 230 ,(1998) , 10.1007/3-540-49372-7_24

Rafael Hector Saavedra-Barrera, CPU performance evaluation and execution time prediction using narrow spectrum benchmarking CPU performance evaluation and execution time prediction using narrow spectrum benchmarking. ,(1992)

10.

Jeremy G. Siek, Andrew Lumsdaine, A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library european conference on object-oriented programming. pp. 468- 469 ,(1998) , 10.1007/3-540-49255-0_153

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply

来源期刊

我的账户

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply

来源期刊

相似文章 10

我的账户