作者: Yuhang Zhang , Guanghui He , Guoxing Wang , Yongfu Li
DOI: 10.1109/TCAD.2020.2998728
关键词:
摘要: The conventional mapping method between RRAM array and convolutional weights faces two key challenges: 1) nonoptimal energy efficiency 2) RRAM’s temporal variation. To address these challenges, we propose shift duplicate kernel (SDK) weight architecture. Each is duplicated multiple times rearranged on different bitlines in a shifted manner, enabling higher intralayer computational parallelism, reducing the number of input data loading. Hence, this architecture reduces latency consumption both forward backward propagation phases. Furthermore, have introduced parallel-window size allocation algorithm synchronization method. Our proposed aims to balance interlayer pipeline architecture, thus improving overall area efficiency. uses an averaging suppress effect variation during update, enhancing system’s robustness for training. From our experiment results, achieves $\sim 6.8\times $ 2.1\times over Significant improvement classification accuracy by 21.7% under 1%–5% achieved on-chip training task Cifar-10 dataset.