作者: James Demmel , Jack Dongarra , Beresford Parlett , William Kahan , Ming Gu
DOI:
关键词:
摘要: Dense linear algebra (DLA) forms the core of many scientific computing applications. Consequently, there is continuous interest and demand for the development of increasingly better algorithms in the field. Here’better’has a broad meaning, and includes improved reliability, accuracy, robustness, ease of use, and most importantly new or improved algorithms that would more efficiently use the available computational resources to speed up the computation. The rapidly evolving high end computing systems and the close dependence of DLA algorithms on the computational environment is what makes the field particularly dynamic. A typical example of the importance and impact of this dependence is the development of LAPACK [4](and later ScaLAPACK [19]) as a successor to the well known and formerly widely used LINPACK [40] and EISPACK [40] libraries. Both LINPACK and EISPACK were based, and their efficiency depended, on optimized Level 1 BLAS [21]. Hardware development trends though, and in particular an increasing Processor-to-Memory speed gap of approximately 50% per year, started to increasingly show the inefficiency of Level 1 BLAS vs Level 2 and 3 BLAS, which prompted efforts to reorganize DLA algorithms to use block matrix operations in their innermost loops. This formed LAPACK’s design philosophy. Later ScaLAPACK extended the LAPACK library to run scalably on distributed memory parallel computers. There are several current trends and associated challenges that influence the development of DLA software libraries. The main purpose of this work is to identify these trends, address the new challenges, and …