A framework for accelerating bottlenecks in GPU execution with assist warps

作者: Todd C. Mowry , Gennady Pekhimenko , Adwait Jog , Rachata Ausavarungnirun , Chita R. Das

DOI: 10.1016/B978-0-12-803738-6.00015-X

关键词:

摘要: Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands threads. Unfortunately, different bottlenecks during and heterogeneous application requirements create imbalances in utilization resources cores. For example, when a GPU is bottlenecked by available off-chip memory bandwidth, its computational often overwhelmingly idle, waiting for data from arrive.

参考文章(109)
George Chrysos, Perry H. Wang, Jeffery A. Brown, Hong Wang, John P. Shen, Speculative Precomputation on Chip Multiprocessors ,(2002)
Gennady Pekhimenko, Evgeny Bolotin, Mike OConnor, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler, Toggle-Aware Compression for GPUs IEEE Computer Architecture Letters. ,vol. 14, pp. 164- 168 ,(2015) , 10.1109/LCA.2015.2430853
John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach ,(1989)
B J Smith, A pipelined, shared resource MIMD computer Advanced computer architecture. pp. 39- 41 ,(1986)
Wen-mei W. Hwu, David B. Kirk, Programming Massively Parallel Processors: A Hands-on Approach Morgan Kaufmann. ,(2012)
Julien Dusser, Thomas Piquet, André Seznec, Zero-content augmented caches Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09. pp. 46- 55 ,(2009) , 10.1145/1542275.1542288
Jian Huang, D.J. Lilja, Exploiting basic block value locality with block reuse high-performance computer architecture. pp. 106- 114 ,(1999) , 10.1109/HPCA.1999.744342
Alaa R. Alameldeen, David A. Wood, Adaptive Cache Compression for High-Performance Processors ACM SIGARCH Computer Architecture News. ,vol. 32, pp. 212- 223 ,(2004) , 10.1145/1028176.1006719
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator international symposium on performance analysis of systems and software. pp. 163- 174 ,(2009) , 10.1109/ISPASS.2009.4919648