Matrix Multiplication on the Intel Touchstone Delta.

作者: Steven Huss-Lederman , Elaine M. Jacobson , Anna Tsao , Guodong Zhang

DOI:

关键词:

摘要: Matrix multiplication is a key primitive in block matrix algorithms such as those found LAPACK. We present results from our study of on the Intel Touchstone Delta, distributed memory message-passing architecture with two-dimensional mesh topology. analyze and compare three obtain an implementation, BiMMeR, that uses communication primitives highly suited to Delta exploits single node assembly-coded multiplication. Our algorithm completely general, i.e. able deal various data layouts well arbitrary aspect ratios dimensions, has achieved parallel efficiency 86 %, overall peak performance excess 8 Gflops 256 nodes for 8800 × matrix. describe BiMMeR's design implementation demonstrate scalability robust behavior over varying topologies.

参考文章(0)