作者: Seongwook Park , Kyeongryeol Bong , Dongjoo Shin , Jinmook Lee , Sungpill Choi
DOI: 10.1109/ISSCC.2015.7062935
关键词:
摘要: Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D and motion recognition use DL due to its best-in-class accuracy. There are 2 types of DL: supervised labeled data unsupervised unlabeled data. With DL, most time is spent massively iterative weight updates restricted Boltzmann machine [2]. For −100MB training dataset, >100 TOP computational capability ∼40GB/s IO SRAM bandwidth required. So, 3.4GHz CPU needs >10 hours −100K input-vector dataset takes ∼1 second recognition, which far from real-time processing. Thus, typically done using cloud servers or high-performance GPU environments learning-on-server capability. However, the wide smart portable devices, smartphones tablets, results many applications need processing learning, tagging private photos personal devices. A energy-efficient DL/DI (deep inference) processor required realize user-centric pattern