Automatic blocking of QR and LU factorizations for locality

作者： Qing Yi , Ken Kennedy , Haihang You , Keith Seymour , Jack Dongarra

关键词:

摘要: QR and LU factorizations for dense matrices are important linear algebra computations that widely used in scientific applications. To efficiently perform these on modern computers, the factorization algorithms need to be blocked when operating large effectively exploit deep cache hierarchy prevalent today's computer memory systems. Because both (based Householder transformations) contain complex loop structures, few compilers can fully automate blocking of algorithms. Though libraries such as LAPACK provides manually implementations algorithms, by automatically generating versions computations, more benefit gained automatic adaptation different strategies. This paper demonstrates how apply an aggressive transformation technique, dependence hoisting, produce efficient blockings with partial pivoting. We present strategies generated our optimizer compare performance auto-blocked tuned LAPACK, using reference BLAS, ATLAS BLAS native specially underlying machine architectures.

参考文章(24)

Tatiana Shpeisman, David Wonnacott, William Pugh, Vadim Maslov, Wayne Kelly, Evan Rosser, The Omega Library interface guide University of Maryland at College Park. ,(1995)

John R. Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach ,(2001)

Allen, Optimizing Compilers for Modern Architectures ,(2004)

Michael Wolfe, Optimizing Supercompilers for Supercomputers ,(1989)

William Pugh, Evan Rosser, Iteration Space Slicing for Locality languages and compilers for parallel computing. pp. 164- 184 ,(1999) , 10.1007/3-540-44905-1_11

Kathryn S. McKinley, Ken Kennedy, Typed Fusion with Applications to Parallel and Sequential Code Generation ,(1994)

Nicholas Mitchell, Larry Carter, Jeanne Ferrante, Karin Högstedt, Quantifying the Multi-level Nature of Tiling Interactions languages and compilers for parallel computing. pp. 1- 15 ,(1997) , 10.1007/BFB0032680

Amy W. Lim, Gerald I. Cheong, Monica S. Lam, An affine partitioning algorithm to maximize parallelism and minimize communication international conference on supercomputing. pp. 228- 237 ,(1999) , 10.1145/305138.305197

Induprakas Kodukula, Nawaaz Ahmed, Keshav Pingali, Data-centric multi-level blocking programming language design and implementation. ,vol. 32, pp. 346- 357 ,(1997) , 10.1145/258915.258946

10.

Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests international conference on supercomputing. pp. 141- 152 ,(2000) , 10.1145/2591635.2667179

Automatic blocking of QR and LU factorizations for locality

来源期刊

我的账户

Automatic blocking of QR and LU factorizations for locality

来源期刊

相似文章 10

我的账户