作者: Xushen Han , Dajiang Zhou , Shihao Wang , Shinji Kimura
DOI: 10.1109/ICCD.2016.7753296
关键词:
摘要: Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources regarded as a promising platform for CNN's implementation. At massive parallelism units, however, the external memory bandwidth, which is constrained by pin count chip, becomes system bottleneck. Moreover, solutions usually lack flexibility to be reconfigured various parameters CNNs. This paper presents CNN-MERP address these issues. incorporates an efficient hierarchy significantly reduces bandwidth requirements from multiple optimizations including on/off-chip data allocation, flow optimization reuse. The proposed 2-level reconfigurability utilized enable fast reconfiguration, based on control logic multiboot feature FPGA. As result, requirement 1.94MB/GFlop achieved, 55% lower than prior arts. Under limited DRAM throughput 1244GFlop/s achieved at Vertex UltraScale platform, 5.48 times higher state-of-the-art FPGA implementations.