作者: Ozana Silvia Dragomir , Todor Stefanov , Koen Bertels
关键词:
摘要: In this article, we present a new technique for optimizing loops that contain kernels mapped on reconfigurable fabric. We assume the Molen machine organization as our framework. propose combining loop unrolling with shifting, which is used to relocate function calls contained in body such every iteration of transformed loop, software functions (running GPP) execute parallel multiple instances kernel FPGA). The algorithm computes optimal unroll factor and determines most appropriate transformation (which can be combination plus shifting or either two). This method based profiling information about kernel’s execution times GPP FPGA, memory transfers area utilization. experimental part, apply several from nests extracted real-life applications (DCT SAD MPEG2 encoder, Quantizer JPEG, Sobel’s Convolution) perform an analysis results, comparing them theoretical maximum speedup by Amdahl’s Law showing when how transformations are beneficial.