Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU

作者: Markus Steinberger , Rhaleb Zayer , Hans-Peter Seidel

DOI: 10.1145/3079079.3079086

关键词:

摘要: The rising popularity of the graphics processing unit (GPU) across various numerical computing applications triggered a breakneck race to optimize key kernels and in particular, sparse matrix-vector product (SpMV). Despite great strides, most existing GPU-SpMV approaches trade off one aspect performance against another. They either require preprocessing, exhibit inconsistent behavior, lead execution divergence, suffer load imbalance or induce detrimental memory access patterns. In this paper, we present an uncompromising approach for SpMV on GPU. Our requires no separate preprocessing knowledge matrix structure works directly standard compressed rows (CSR) data format. From global perspective, it exhibits homogeneous behavior reflected efficient patterns steady per-thread workload. local avoids heterogeneous paths by adapting its work at hand, uses encoding keep temporary requirements on-chip low, leads divergence-free execution. We evaluate our more than 2500 matrices comparing vendor provided, state-of-the-art implementations. not only significantly outperforms operating CSR format ( 20% average increase), but also that preprocess even when time is discarded. Additionally, same strategies significant increase adapted transpose SpMV.

参考文章(29)
Yongchao Liu, Bertil Schmidt, LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs application-specific systems, architectures, and processors. pp. 82- 89 ,(2015) , 10.1109/ASAP.2015.7245713
Juan C Pichel, Francisco F Rivera, Marcos Fernández, Aurelio Rodríguez, None, Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs Microprocessors and Microsystems. ,vol. 36, pp. 65- 77 ,(2012) , 10.1016/J.MICPRO.2011.05.005
F. Vázquez, J. J. Fernández, E. M. Garzón, A new approach for sparse matrix vector product on NVIDIA GPUs Concurrency and Computation: Practice and Experience. ,vol. 23, pp. 815- 826 ,(2011) , 10.1002/CPE.1658
Weifeng Liu, Brian Vinter, CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication international conference on supercomputing. pp. 339- 350 ,(2015) , 10.1145/2751205.2751209
Bor-Yiing Su, Kurt Keutzer, clSpMV Proceedings of the 26th ACM international conference on Supercomputing - ICS '12. pp. 353- 364 ,(2012) , 10.1145/2304576.2304624
Timothy A. Davis, Yifan Hu, The university of Florida sparse matrix collection ACM Transactions on Mathematical Software. ,vol. 38, pp. 1- 25 ,(2011) , 10.1145/2049662.2049663
Hiroki Yoshizawa, Daisuke Takahashi, Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS Format on GPUs 2012 IEEE 15th International Conference on Computational Science and Engineering. pp. 130- 136 ,(2012) , 10.1109/ICCSE.2012.28
Wai Teng Tang, Wen Jun Tan, Rajarshi Ray, Yi Wen Wong, Weiguang Chen, Shyh-hao Kuo, Rick Siow Mong Goh, Stephen John Turner, Weng-Fai Wong, Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes ieee international conference on high performance computing data and analytics. pp. 26- ,(2013) , 10.1145/2503210.2503234
Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarath, P Sadayappan, None, Fast sparse matrix-vector multiplication on GPUs for graph applications ieee international conference on high performance computing data and analytics. pp. 781- 792 ,(2014) , 10.1109/SC.2014.69
Joseph L. Greathouse, Mayank Daga, Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format ieee international conference on high performance computing data and analytics. pp. 769- 780 ,(2014) , 10.1109/SC.2014.68