Adaptive tuning of sparse matrix-vector multiplication on Cell architecture

作者: Qian Cao , Chongchong Zhao , Yunxing Zhang , Junxiu Chen , Yutian Zhu

DOI: 10.1109/ICCET.2010.5485581

关键词:

摘要: Sparse matrix-vector multiplication is a kernel which widely used in scientific applications. The sparse-data usually stored the compressed row storage format, introduces irregular reference pattern. It's problem for software cache on Cell architecture because line always set to specific size traditional strategies, limits utilization and increases memory bandwidth overhead. In this paper, we propose an adaptive strategy continuously adjusts during SpMV execution. Therefore, transferred data execution time are significantly decreased. Moreover, prefetching scheme proposed further improve performance. evaluation indicates that our achieves speedup factor from 2.11 3.57 compared approach. And approach translates into up 3.2 multiplications.

参考文章(28)
W. Meesen, R.H. Bisseling, Communication balancing in parallel sparse matrix-vector multiplication Electronic Transactions on Numerical Analysis. ,vol. 21, pp. 47- 65 ,(2005)
Leonid Oliker, John Shalf, Parry Husbands, Katherine Yelick, Samuel W. Williams, Dense and Sparse Matrix Operations on the Cell Processor Lawrence Berkeley National Laboratory. ,(2005)
David Moloney, Dermot Geraghty, Colm McSweeney, Ciaran McElroy, Streaming sparse matrix compression/decompression high performance embedded architectures and compilers. pp. 116- 129 ,(2005) , 10.1007/11587514_9
Leonid Oliker, John Shalf, Parry Husbands, Shoaib Kamil, Katherine Yelick, Samuel Williams, The Potential of the Cell Processor for Scientific Computing International Parallel&DistributedProcessing Symposium - 2006, Rhodes Island, Greece, April 25 - 29,2006. ,(2005)
Richard W. Vuduc, Hyun-Jin Moon, Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure High Performance Computing and Communications. pp. 807- 816 ,(2005) , 10.1007/11557654_91
S Williams, K Datta, J Carter, L Oliker, J Shalf, K Yelick, D Bailey, PERI - auto-tuning memory-intensive kernels for multicore Lawrence Berkeley National Laboratory. ,vol. 125, pp. 012038- ,(2008) , 10.1088/1742-6596/125/1/012038
Jeremiah Willcock, Andrew Lumsdaine, Accelerating sparse matrix computations via data compression Proceedings of the 20th annual international conference on Supercomputing - ICS '06. pp. 307- 316 ,(2006) , 10.1145/1183401.1183444
Kornilios Kourtis, Georgios Goumas, Nectarios Koziris, Optimizing sparse matrix-vector multiplication using index and value compression Proceedings of the 2008 conference on Computing frontiers - CF '08. pp. 87- 96 ,(2008) , 10.1145/1366230.1366244
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07. pp. 38- ,(2007) , 10.1145/1362622.1362674
Seyong Lee, Rudolf Eigenmann, Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems Proceedings of the 22nd annual international conference on Supercomputing - ICS '08. pp. 195- 204 ,(2008) , 10.1145/1375527.1375558