Memory bandwidth management for deep learning applications

作者： Frank Torsten Bernd Seide , Ray A. Bittner

DOI:

关键词:

摘要: In a data center, neural network evaluations can be included for services involving image or speech recognition by using field programmable gate array (FPGA) other parallel processor. The memory bandwidth limitations of providing weighted sets from an external to the FPGA (or processor) managed queuing up input plurality cores executing at in batches least two feature vectors. vectors observation same stream different streams. then act on batch each loading datasets.

freepatentsonline.com 本地加速

freepatentsonline.com LINK 下载加速

lens.org UNKNOWN 下载加速

参考文章(26)

Ulrich Rückert, Madhura Purnaprajna, Mario Porrmann, Christopher Pohl, Using Run-time Reconfiguration for Energy Savings in Parallel Data Processing Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA'09, July 13-16, 2009, Las Vegas, Nevada, USA. pp. 119- 125 ,(2009)

Frank Torsten Bernd Seide, Dong Yu, Adam C. Eversole, Xie Chen, Gang Li, Deep neural networks training for speech and pattern recognition ,(2012)

Marcus Frankllin Dutton, Rasterizer packet generator for use in graphics processor ,(2012)

X. Chen, X. Liu, M.J.F. Gales, P. C. Woodland, Improving the training and evaluation efficiency of recurrent neural network language models international conference on acoustics, speech, and signal processing. pp. 5401- 5405 ,(2015) , 10.1109/ICASSP.2015.7179003

M.M. El Choubassi, H.E. El Khoury, C.E.J. Alagha, J.A. Skaf, M.A. Al-Alaoui, Arabic speech recognition using recurrent neural networks international symposium on signal processing and information technology. pp. 543- 547 ,(2003) , 10.1109/ISSPIT.2003.1341178

Jungsuk Kim, Jike Chong, Ian Lane, Methods for hybrid gpu/cpu data processing ,(2013)

Jonghong Kim, Kyuyeon Hwang, Wonyong Sung, X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks international conference on acoustics, speech, and signal processing. pp. 7510- 7514 ,(2014) , 10.1109/ICASSP.2014.6855060

Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, Mark Oskin, SNNAP: Approximate computing on programmable SoCs via neural acceleration high-performance computer architecture. pp. 603- 614 ,(2015) , 10.1109/HPCA.2015.7056066

Feng Yan, Olatunji Ruwase, Yuxiong He, Trishul Chilimbi, Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems knowledge discovery and data mining. pp. 1355- 1364 ,(2015) , 10.1145/2783258.2783270

10.

Edward C. Lin, Kai Yu, Rob A. Rutenbar, Tsuhan Chen, A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA field programmable gate arrays. pp. 60- 68 ,(2007) , 10.1145/1216919.1216928

Memory bandwidth management for deep learning applications

来源期刊

我的账户

Memory bandwidth management for deep learning applications

来源期刊

相似文章 10

我的账户