作者: Yinan Li , Jack Dongarra , Stanimire Tomov
DOI: 10.1007/978-3-642-01970-8_89
关键词: Computer science 、 Double-precision floating-point format 、 Matrix multiplication 、 Computational science 、 Graphics 、 Key (cryptography) 、 Parallel computing 、 Single-precision floating-point format 、 Software portability 、 Linear algebra 、 CUDA
摘要: The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is …