Automatic blocking of QR and LU factorizations for locality

作者: Qing Yi , Ken Kennedy , Haihang You , Keith Seymour , Jack Dongarra

DOI: 10.1145/1065895.1065898

关键词:

摘要: QR and LU factorizations for dense matrices are important linear algebra computations that widely used in scientific applications. To efficiently perform these on modern computers, the factorization algorithms need to be blocked when operating large effectively exploit deep cache hierarchy prevalent today's computer memory systems. Because both (based Householder transformations) contain complex loop structures, few compilers can fully automate blocking of algorithms. Though libraries such as LAPACK provides manually implementations algorithms, by automatically generating versions computations, more benefit gained automatic adaptation different strategies. This paper demonstrates how apply an aggressive transformation technique, dependence hoisting, produce efficient blockings with partial pivoting. We present strategies generated our optimizer compare performance auto-blocked tuned LAPACK, using reference BLAS, ATLAS BLAS native specially underlying machine architectures.

参考文章(24)
Tatiana Shpeisman, David Wonnacott, William Pugh, Vadim Maslov, Wayne Kelly, Evan Rosser, The Omega Library interface guide University of Maryland at College Park. ,(1995)
William Pugh, Evan Rosser, Iteration Space Slicing for Locality languages and compilers for parallel computing. pp. 164- 184 ,(1999) , 10.1007/3-540-44905-1_11
Nicholas Mitchell, Larry Carter, Jeanne Ferrante, Karin Högstedt, Quantifying the Multi-level Nature of Tiling Interactions languages and compilers for parallel computing. pp. 1- 15 ,(1997) , 10.1007/BFB0032680
Amy W. Lim, Gerald I. Cheong, Monica S. Lam, An affine partitioning algorithm to maximize parallelism and minimize communication international conference on supercomputing. pp. 228- 237 ,(1999) , 10.1145/305138.305197
Induprakas Kodukula, Nawaaz Ahmed, Keshav Pingali, Data-centric multi-level blocking programming language design and implementation. ,vol. 32, pp. 346- 357 ,(1997) , 10.1145/258915.258946
Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests international conference on supercomputing. pp. 141- 152 ,(2000) , 10.1145/2591635.2667179