Transformer: Run-time reprogrammable heterogeneous architecture for transparent acceleration of dynamic workloads

作者: Peilong Li , Yan Luo , Jun Yang

DOI: 10.1016/J.JPDC.2015.08.002

关键词:

摘要: Abstract Heterogeneous architectures face challenges regarding transparent acceleration as well the allocation of resources to cores and accelerators. The “Transformer”, a run-time reprogrammable, heterogeneous architecture consisting reconfigurable logic with support for coarse-grained dynamic, unpredictable workloads present in mobile cloud computing environments, is proposed solution. allows instantiation one or more functions, an on-chip logic, which responds demands compute-intensive software libraries. hardware controller wrapper functions are designed profile workloads, reprogram internal invoke appropriate functions. Novel heuristics derived respect accelerator function scheduling. In order optimize performance power efficiency, system parameters explored, including L1 L2 cache sizes, local buffer simulation results indicate that Transformer provides significant improvements terms performance, up 14 × single-type 2.3 dynamic energy 6.9 various workloads.

参考文章(32)
K. Scott Hemmert, Craig D. Ulmer, Keith Douglas Underwood, Ryan Hilles, David C. Thompson, Reconfigurable computing aspects of the Cray XD1. ,(2005)
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, Changkyu Kim, DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing IEEE Micro. ,vol. 32, pp. 38- 51 ,(2012) , 10.1109/MM.2012.51
T.J. Callahan, J.R. Hauser, J. Wawrzynek, The Garp architecture and C compiler Computer. ,vol. 33, pp. 62- 69 ,(2000) , 10.1109/2.839323
David Barrie Thomas, Lee Howes, Wayne Luk, A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '09. pp. 63- 72 ,(2009) , 10.1145/1508128.1508139
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch, Run-time Partial Reconfiguration speed investigation and architectural design space exploration field-programmable logic and applications. pp. 498- 502 ,(2009) , 10.1109/FPL.2009.5272463
Dorit S. Hochbaum, David B. Shmoys, A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach SIAM Journal on Computing. ,vol. 17, pp. 539- 551 ,(1988) , 10.1137/0217033
Selma Saidi, Pranav Tendulkar, Thierry Lepley, Oded Maler, Optimizing explicit data transfers for data parallel applications on the cell architecture high performance embedded architectures and compilers. ,vol. 8, pp. 37- ,(2012) , 10.1145/2086696.2086716
Tony M. Brewer, Instruction Set Innovations for the Convey HC-1 Computer IEEE Micro. ,vol. 30, pp. 70- 79 ,(2010) , 10.1109/MM.2010.36
Moustafa AbdelBaky, Hyunjoo Kim, Ivan Rodero, Manish Parashar, Accelerating MapReduce Analytics Using CometCloud international conference on cloud computing. pp. 447- 454 ,(2012) , 10.1109/CLOUD.2012.150
Eric S. Chung, John D. Davis, Jaewon Lee, LINQits: big data on little clients international symposium on computer architecture. ,vol. 41, pp. 261- 272 ,(2013) , 10.1145/2485922.2485945