Reconfigurable Convolutional Kernels for Neural Networks on FPGAs

作者: Martin Hardieck , Martin Kumm , Konrad Möller , Peter Zipf

DOI: 10.1145/3289602.3293905

关键词:

摘要: Convolutional neural networks (CNNs) gained great success in machine learning applications and much attention was paid to their acceleration on field programmable gate arrays (FPGAs). The most demanding computational complexity of CNNs is found the convolutional layers, which account for 90% total operations. fact that parameters layers do not change over a long time interval weight stationary allows use reconfiguration reduce resource requirements. This work proposes several alternative schemes significantly sum-of-products proposed direct configuration provide least requirements fast times 32 clock cycles but require additional memory pre-computed configurations. online scheme uses an computation LUT contents avoid this overhead. Finally, duplicates reconfigurable LUTs can be completely hidden time. Combined with few circuits, provides same as conventional parallel kernel offers large reductions up 80% LUTs.

参考文章(32)
Ankur Agrawal, Pritish Narayanan, Kailash Gopalakrishnan, Suyog Gupta, Deep Learning with Limited Numerical Precision arXiv: Learning. ,(2015)
Martin Kumm, Konrad Moller, Peter Zipf, Dynamically reconfigurable FIR filter architectures with fast reconfiguration 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC). pp. 1- 8 ,(2013) , 10.1109/RECOSOC.2013.6581517
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, A dynamically configurable coprocessor for convolutional neural networks Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10. ,vol. 38, pp. 247- 257 ,(2010) , 10.1145/1815961.1815993
T. J. Dekker, A floating-point technique for extending the available precision Numerische Mathematik. ,vol. 18, pp. 224- 242 ,(1971) , 10.1007/BF01397083
Nicolas Brunie, Florent de Dinechin, Matei Istoan, Guillaume Sergent, Kinga Illyes, Bogdan Popa, Arithmetic core generation using bit heaps field-programmable logic and applications. pp. 1- 8 ,(2013) , 10.1109/FPL.2013.6645544
Javier Hormigo, Gabriel Caffarena, Juan P. Oliver, Eduardo Boemo, Self-Reconfigurable Constant Multiplier for FPGA ACM Transactions on Reconfigurable Technology and Systems. ,vol. 6, pp. 14- ,(2013) , 10.1145/2490830
Martin Kumm, Konrad Moller, Peter Zipf, Reconfigurable FIR filter using distributed arithmetic on FPGAs international symposium on circuits and systems. pp. 2058- 2061 ,(2013) , 10.1109/ISCAS.2013.6572277
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong, Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks field programmable gate arrays. pp. 161- 170 ,(2015) , 10.1145/2684746.2689060
Philip Brisk, Hadi Parandeh-Afshar, Paolo Ienne, Efficient synthesis of compressor trees on FPGAs asia and south pacific design automation conference. pp. 138- 143 ,(2008) , 10.5555/1356802.1356841
Hadi Parandeh-Afshar, Arkosnato Neogy, Philip Brisk, Paolo Ienne, Compressor tree synthesis on commercial high-performance FPGAs ACM Transactions on Reconfigurable Technology and Systems. ,vol. 4, pp. 1- 19 ,(2011) , 10.1145/2068716.2068725