A Note on Auto-tuning GEMM for GPUs

作者: Yinan Li , Jack Dongarra , Stanimire Tomov

DOI: 10.1007/978-3-642-01970-8_89

关键词: Computer scienceDouble-precision floating-point formatMatrix multiplicationComputational scienceGraphicsKey (cryptography)Parallel computingSingle-precision floating-point formatSoftware portabilityLinear algebraCUDA

摘要: The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is …

参考文章(13)
R. Clint Whaley, Antoine Petitet, Jack J. Dongarra, New trends in high performance computing ieee international conference on high performance computing data and analytics. ,vol. 27, pp. 3- 35 ,(2001) , 10.1016/S0167-8191(00)00087-9
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology international conference on supercomputing. pp. 253- 260 ,(1997) , 10.1145/2591635.2667174
J. Dongarra, G. Bosilca, Z. Chen, V. Eijkhout, G. E. Fagg, E. Fuentes, J. Langou, P. Luszczek, J. Pjesivac-Grbovic, K. Seymour, H. You, S. S. Vadhiyar, Self-adapting numerical software (SANS) effort Ibm Journal of Research and Development. ,vol. 50, pp. 223- 238 ,(2006) , 10.1147/RD.502.0223
John A. Gunnels, Fred G. Gustavson, Greg M. Henry, Robert A. van de Geijn, FLAME: Formal Linear Algebra Methods Environment ACM Transactions on Mathematical Software. ,vol. 27, pp. 422- 455 ,(2001) , 10.1145/504210.504213
James W. Demmel, Vasily Volkov, Benchmarking GPUs to tune dense linear algebra ieee international conference on high performance computing data and analytics. pp. 31- ,(2008) , 10.5555/1413370.1413402
M. Frigo, S.G. Johnson, FFTW: an adaptive software architecture for the FFT international conference on acoustics speech and signal processing. ,vol. 3, pp. 1381- 1384 ,(1998) , 10.1109/ICASSP.1998.681704
John Shalf, Krste Asanovic, Parry Husbands, Katherine A. Yelick, David A. Patterson, William Lester Plishker, Joseph James Gebis, Samuel Webb Williams, Ras Bodik, Bryan Christopher Catanzaro, Kurt Keutzer, The Landscape of Parallel Computing Research: A View from Berkeley ,(2006)
Jack Dongarra, Gregory Peterson, Stanimir Tomov, Jeff Allred, Vincent Natoli, David Richie, Exploring New Architectures in Accelerating CFD for Air Force Applications dod hpcmp users group conference. pp. 472- 478 ,(2008) , 10.1109/DOD.HPCMP.UGC.2008.12
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C. Whaley, K. Yelick, Self-Adapting Linear Algebra Algorithms and Software Proceedings of the IEEE. ,vol. 93, pp. 293- 312 ,(2005) , 10.1109/JPROC.2004.840848
Stanimire Tomov, Jack Dongarra, Marc Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems parallel computing. ,vol. 36, pp. 232- 240 ,(2010) , 10.1016/J.PARCO.2009.12.005