作者: Martin Hardieck , Martin Kumm , Konrad Möller , Peter Zipf
关键词:
摘要: Convolutional neural networks (CNNs) gained great success in machine learning applications and much attention was paid to their acceleration on field programmable gate arrays (FPGAs). The most demanding computational complexity of CNNs is found the convolutional layers, which account for 90% total operations. fact that parameters layers do not change over a long time interval weight stationary allows use reconfiguration reduce resource requirements. This work proposes several alternative schemes significantly sum-of-products proposed direct configuration provide least requirements fast times 32 clock cycles but require additional memory pre-computed configurations. online scheme uses an computation LUT contents avoid this overhead. Finally, duplicates reconfigurable LUTs can be completely hidden time. Combined with few circuits, provides same as conventional parallel kernel offers large reductions up 80% LUTs.