Memory bandwidth management for deep learning applications

作者: Frank Torsten Bernd Seide , Ray A. Bittner

DOI:

关键词:

摘要: In a data center, neural network evaluations can be included for services involving image or speech recognition by using field programmable gate array (FPGA) other parallel processor. The memory bandwidth limitations of providing weighted sets from an external to the FPGA (or processor) managed queuing up input plurality cores executing at in batches least two feature vectors. vectors observation same stream different streams. then act on batch each loading datasets.

参考文章(26)
Ulrich Rückert, Madhura Purnaprajna, Mario Porrmann, Christopher Pohl, Using Run-time Reconfiguration for Energy Savings in Parallel Data Processing Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA'09, July 13-16, 2009, Las Vegas, Nevada, USA. pp. 119- 125 ,(2009)
Frank Torsten Bernd Seide, Dong Yu, Adam C. Eversole, Xie Chen, Gang Li, Deep neural networks training for speech and pattern recognition ,(2012)
X. Chen, X. Liu, M.J.F. Gales, P. C. Woodland, Improving the training and evaluation efficiency of recurrent neural network language models international conference on acoustics, speech, and signal processing. pp. 5401- 5405 ,(2015) , 10.1109/ICASSP.2015.7179003
M.M. El Choubassi, H.E. El Khoury, C.E.J. Alagha, J.A. Skaf, M.A. Al-Alaoui, Arabic speech recognition using recurrent neural networks international symposium on signal processing and information technology. pp. 543- 547 ,(2003) , 10.1109/ISSPIT.2003.1341178
Jungsuk Kim, Jike Chong, Ian Lane, Methods for hybrid gpu/cpu data processing ,(2013)
Jonghong Kim, Kyuyeon Hwang, Wonyong Sung, X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks international conference on acoustics, speech, and signal processing. pp. 7510- 7514 ,(2014) , 10.1109/ICASSP.2014.6855060
Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, Mark Oskin, SNNAP: Approximate computing on programmable SoCs via neural acceleration high-performance computer architecture. pp. 603- 614 ,(2015) , 10.1109/HPCA.2015.7056066
Feng Yan, Olatunji Ruwase, Yuxiong He, Trishul Chilimbi, Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems knowledge discovery and data mining. pp. 1355- 1364 ,(2015) , 10.1145/2783258.2783270
Edward C. Lin, Kai Yu, Rob A. Rutenbar, Tsuhan Chen, A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA field programmable gate arrays. pp. 60- 68 ,(2007) , 10.1145/1216919.1216928