作者: Qing Yi , Ken Kennedy , Haihang You , Keith Seymour , Jack Dongarra
关键词:
摘要: QR and LU factorizations for dense matrices are important linear algebra computations that widely used in scientific applications. To efficiently perform these on modern computers, the factorization algorithms need to be blocked when operating large effectively exploit deep cache hierarchy prevalent today's computer memory systems. Because both (based Householder transformations) contain complex loop structures, few compilers can fully automate blocking of algorithms. Though libraries such as LAPACK provides manually implementations algorithms, by automatically generating versions computations, more benefit gained automatic adaptation different strategies. This paper demonstrates how apply an aggressive transformation technique, dependence hoisting, produce efficient blockings with partial pivoting. We present strategies generated our optimizer compare performance auto-blocked tuned LAPACK, using reference BLAS, ATLAS BLAS native specially underlying machine architectures.