作者: Gabriel Rivera , Chau-Wen Tseng
DOI: 10.1007/978-3-540-49051-7_12
关键词: Algorithm 、 Compiler 、 Computer science 、 Optimizing compiler 、 Linear algebra 、 Matrix multiplication 、 Padding 、 CPU cache 、 Parallel computing 、 Program optimization 、 Loop tiling
摘要: Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to have been suggested for avoiding conflict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show improves of matrix multiply over 100% some cases range sizes. Comparing the efficacy different algorithms, we discover rectangular tiles are slightly more efficient than square tiles. Overall, from 0-250%. Copying at run time proves quite effective.