作者: Ju-wook Jang , Seonil Choi , Viktor K. Prasanna
关键词: Parallel computing 、 Matrix multiplication 、 Systems design 、 Dissipation 、 Algorithmics 、 Efficient energy use 、 Circuit design 、 Field-programmable gate array 、 Multiplication 、 Computer science 、 Simulation
摘要: We develop new algorithms and architectures for matrix multiplication on configurable devices. These designs significantly reduce the energy dissipation latency compared with state-of-the-art FPGA-based designs. derive functions to represent impact of algorithmic level design choices system-wide dissipation, latency, area by capturing algorithm architecture details including features target FPGA. The are used optimize performance under constraints a family candidate architectures. As result, our improve optimized from recent Xilinx library 32% 88% without any increase in area-latency product. In terms comprehensive metrics such as EAT (Energy-Area-Time) E/AT (Energy/Area-Time), offer superior 50%-79% 13%-44%, respectively. also address how exploit further increases density future FPGA devices asymptotic improvement larger size matrices.