作者: Richard W. Vuduc , Hyun-Jin Moon
DOI: 10.1007/11557654_91
关键词:
摘要: We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when matrix structure consists multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. split matrix, A, into a sum, A1 + A2 ... As, where each term is stored in new data we refer to as unaligned block compressed row (UBCSR) format. A classical approach which stores BCSR can also reduce execution time, but improvements may be limited because imposes an alignment non-zeros that leads extra work filled-in zeros. Combining splitting with UBCSR reduces while retaining generally lower memory bandwidth requirements and register-level tiling opportunities BCSR. show speedups high 2.1× over no blocking, 1.8× used prior set application matrices. Even does not significantly, usually storage.