作者: Markus Steinberger , Rhaleb Zayer , Hans-Peter Seidel
关键词:
摘要: The rising popularity of the graphics processing unit (GPU) across various numerical computing applications triggered a breakneck race to optimize key kernels and in particular, sparse matrix-vector product (SpMV). Despite great strides, most existing GPU-SpMV approaches trade off one aspect performance against another. They either require preprocessing, exhibit inconsistent behavior, lead execution divergence, suffer load imbalance or induce detrimental memory access patterns. In this paper, we present an uncompromising approach for SpMV on GPU. Our requires no separate preprocessing knowledge matrix structure works directly standard compressed rows (CSR) data format. From global perspective, it exhibits homogeneous behavior reflected efficient patterns steady per-thread workload. local avoids heterogeneous paths by adapting its work at hand, uses encoding keep temporary requirements on-chip low, leads divergence-free execution. We evaluate our more than 2500 matrices comparing vendor provided, state-of-the-art implementations. not only significantly outperforms operating CSR format ( 20% average increase), but also that preprocess even when time is discarded. Additionally, same strategies significant increase adapted transpose SpMV.