作者: C.H. Bischof , S. Huss-Lederman , E.M. Jacobson , Xiaobai Sun , A. Tsao
DOI: 10.2172/10153108
关键词:
摘要: In this document, the authors are concerned with effects of data layouts for nonsquare processor meshes on implementation common dense linear algebra kernels such as matrix-matrix multiplication, LU factorizations, or eigenvalue solvers. particular, they address ease programming and tunability resulting software. They introduce a generalization torus wrap layout that results in decoupling {open_quotes}local{close_quotes} {open_quotes}global{close_quotes} view. As result, it allows intuitive algorithms tuning algorithm particular mesh aspect ratio machine characteristics. This is simple proposed HPF but, opinion, enhances well case performance tuning. emphasize do not advocate all users need be these issues. do, however, believe, foreseeable future {open_quotes}assembler coding{close_quotes} (as message-passing code likely to viewed from programmers` perspective) will needed deliver high computationally intensive kernels. believe adoption approach only would accelerate generation efficient software libraries but also wouldmore » result. point out, new necessitate an compiler ensure objects operated consistent fashion across subroutine function calls.« less