4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

作者: Seongwook Park , Kyeongryeol Bong , Dongjoo Shin , Jinmook Lee , Sungpill Choi

DOI: 10.1109/ISSCC.2015.7062935

关键词:

摘要: Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D and motion recognition use DL due to its best-in-class accuracy. There are 2 types of DL: supervised labeled data unsupervised unlabeled data. With DL, most time is spent massively iterative weight updates restricted Boltzmann machine [2]. For −100MB training dataset, >100 TOP computational capability ∼40GB/s IO SRAM bandwidth required. So, 3.4GHz CPU needs >10 hours −100K input-vector dataset takes ∼1 second recognition, which far from real-time processing. Thus, typically done using cloud servers or high-performance GPU environments learning-on-server capability. However, the wide smart portable devices, smartphones tablets, results many applications need processing learning, tagging private photos personal devices. A energy-efficient DL/DI (deep inference) processor required realize user-centric pattern

参考文章(6)
J. G. Liao, Variance Reduction in Gibbs Sampler Using Quasi Random Numbers Journal of Computational and Graphical Statistics. ,vol. 7, pp. 253- 266 ,(1998) , 10.1080/10618600.1998.10474775
Junjie Lu, Steven Young, Itamar Arel, Jeremy Holleman, 30.10 A 1TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13μm CMOS international solid-state circuits conference. ,vol. 50, pp. 270- 281 ,(2014) , 10.1109/JSSC.2014.2356197
Phi-Hung Pham, Darko Jelaca, Clement Farabet, Berin Martini, Yann LeCun, Eugenio Culurciello, NeuFlow: Dataflow vision processing system-on-a-chip 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS). pp. 1044- 1047 ,(2012) , 10.1109/MWSCAS.2012.6292202
Jung Kuk Kim, Phil Knag, Thomas Chen, Zhengya Zhang, A 6.67mW sparse coding ASIC enabling on-chip learning and inference symposium on vlsi circuits. pp. 1- 2 ,(2014) , 10.1109/VLSIC.2014.6858385
Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 609- 616 ,(2009) , 10.1145/1553374.1553453
Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh, A fast learning algorithm for deep belief nets Neural Computation. ,vol. 18, pp. 1527- 1554 ,(2006) , 10.1162/NECO.2006.18.7.1527