CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

作者: Xushen Han , Dajiang Zhou , Shihao Wang , Shinji Kimura

DOI: 10.1109/ICCD.2016.7753296

关键词:

摘要: Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources regarded as a promising platform for CNN's implementation. At massive parallelism units, however, the external memory bandwidth, which is constrained by pin count chip, becomes system bottleneck. Moreover, solutions usually lack flexibility to be reconfigured various parameters CNNs. This paper presents CNN-MERP address these issues. incorporates an efficient hierarchy significantly reduces bandwidth requirements from multiple optimizations including on/off-chip data allocation, flow optimization reuse. The proposed 2-level reconfigurability utilized enable fast reconfiguration, based on control logic multiboot feature FPGA. As result, requirement 1.94MB/GFlop achieved, 55% lower than prior arts. Under limited DRAM throughput 1244GFlop/s achieved at Vertex UltraScale platform, 5.48 times higher state-of-the-art FPGA implementations.

参考文章(19)
Ronan Collobert, Clément Farabet, Koray Kavukcuoglu, Torch7: A Matlab-like Environment for Machine Learning neural information processing systems. ,(2011)
Geoffrey E. Hinton, Vinod Nair, Rectified Linear Units Improve Restricted Boltzmann Machines international conference on machine learning. pp. 807- 814 ,(2010)
Clement Farabet, Cyril Poulet, Jefferson Y. Han, Yann LeCun, CNP: An FPGA-based processor for Convolutional Networks field-programmable logic and applications. pp. 32- 37 ,(2009) , 10.1109/FPL.2009.5272559
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, A dynamically configurable coprocessor for convolutional neural networks Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10. ,vol. 38, pp. 247- 257 ,(2010) , 10.1145/1815961.1815993
Phi-Hung Pham, Darko Jelaca, Clement Farabet, Berin Martini, Yann LeCun, Eugenio Culurciello, NeuFlow: Dataflow vision processing system-on-a-chip 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS). pp. 1044- 1047 ,(2012) , 10.1109/MWSCAS.2012.6292202
Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, Eugenio Culurciello, A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks computer vision and pattern recognition. pp. 696- 701 ,(2014) , 10.1109/CVPRW.2014.106
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, Olivier Temam, DaDianNao: A Machine-Learning Supercomputer international symposium on microarchitecture. pp. 609- 622 ,(2014) , 10.1109/MICRO.2014.58
Fatih Porikli, Francois Bremond, Shiloh L Dockstader, James Ferryman, Anthony Hoogs, Brian C Lovell, Sharath Pankanti, Bernhard Rinner, Peter Tu, Péter L Venetianer, None, Video surveillance: past, present, and now the future [DSP Forum] IEEE Signal Processing Magazine. ,vol. 30, pp. 190- 198 ,(2013) , 10.1109/MSP.2013.2241312
Lukas Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, Luca Benini, Origami: A Convolutional Network Accelerator great lakes symposium on vlsi. pp. 199- 204 ,(2015) , 10.1145/2742060.2743766
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong, Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks field programmable gate arrays. pp. 161- 170 ,(2015) , 10.1145/2684746.2689060