Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures

作者: Shizhao Chen , Jianbin Fang , Donglin Chen , Chuanfu Xu , Zheng Wang

DOI: 10.1109/HPCC/SMARTCITY/DSS.2018.00116

关键词:

摘要: Sparse matrix vector multiplication (SpMV) is one of the most common operations in scientific and high-performance applications, often responsible for application performance bottleneck. While sparse representation has a significant impact on resulting performance, choosing right typically relies expert knowledge trial error. This paper provides first comprehensive study representations two emerging many-core architectures: Intel's Knights Landing (KNL) XeonPhi ARM-based FT-2000Plus (FTP). Our large-scale experiments involved over 9,500 distinct profiling runs performed 956 datasets five mainstream SpMV representations. We show that best depends underlying architecture program input. To help developers to choose optimal representation, we employ machine learning develop predictive model. model trained offline using set training examples. The learned can be used predict any unseen input given architecture. our delivers average 95% 91% available KNL FTP respectively, it achieves this with no runtime overhead.

参考文章(47)
D.R. Kincaid, T.C. Oppe, D.M. Young, ITPACKV 2D user's guide ,(1989)
Dominik Grewe, Zheng Wang, Michael F. P. O’Boyle, OpenCL Task Partitioning in the Presence of GPU Contention languages and compilers for parallel computing. pp. 87- 101 ,(2013) , 10.1007/978-3-319-09967-5_5
Henk J. Sips, Yonggang Che, Jianbin Fang, Chuanfu Xu, Lilun Zhang, Ana Lucia Varbanescu, An Empirical Study of Intel Xeon Phi arXiv: Distributed, Parallel, and Cluster Computing. ,(2013)
Jianbin Fang, Ana Lucia Varbanescu, Xiangke Liao, Henk Sips, Evaluating vector data type usage in OpenCL kernels Concurrency and Computation: Practice and Experience. ,vol. 27, pp. 4586- 4602 ,(2015) , 10.1002/CPE.3424
Georgios Goumas, Kornilios Kourtis, Nikos Anastopoulos, Vasileios Karakasis, Nectarios Koziris, Performance evaluation of the sparse matrix-vector multiplication on modern architectures The Journal of Supercomputing. ,vol. 50, pp. 36- 77 ,(2009) , 10.1007/S11227-008-0251-8
Yonggang Che, Chuanfu Xu, Jianbin Fang, Yongxian Wang, Zhenghua Wang, Realistic Performance Characterization of CFD Applications on Intel Many Integrated Core Architecture The Computer Journal. ,vol. 58, pp. 3279- 3294 ,(2015) , 10.1093/COMJNL/BXV022
D. Grewe, Zheng Wang, M. F. P. O'Boyle, Portable mapping of data parallel programs to OpenCL for heterogeneous systems symposium on code generation and optimization. pp. 1- 10 ,(2013) , 10.1109/CGO.2013.6494993
John Mellor-Crummey, John Garvin, Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam ieee international conference on high performance computing data and analytics. ,vol. 18, pp. 225- 236 ,(2004) , 10.1177/1094342004038951
Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle, Integrating profile-driven parallelism detection and machine-learning-based mapping ACM Transactions on Architecture and Code Optimization. ,vol. 11, pp. 1- 26 ,(2014) , 10.1145/2579561
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey, Efficient sparse matrix-vector multiplication on x86-based many-core processors Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13. pp. 273- 282 ,(2013) , 10.1145/2464996.2465013