A framework for accelerating bottlenecks in GPU execution with assist warps

作者： Todd C. Mowry , Gennady Pekhimenko , Adwait Jog , Rachata Ausavarungnirun , Chita R. Das

关键词:

摘要: Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands threads. Unfortunately, different bottlenecks during and heterogeneous application requirements create imbalances in utilization resources cores. For example, when a GPU is bottlenecked by available off-chip memory bandwidth, its computational often overwhelmingly idle, waiting for data from arrive.

参考文章(109)

Bryon S. Nordquist, Stephen D. Lew, Apparatus, system, and method for coalescing parallel memory requests ,(2006)

George Chrysos, Perry H. Wang, Jeffery A. Brown, Hong Wang, John P. Shen, Speculative Precomputation on Chip Multiprocessors ,(2002)

Gennady Pekhimenko, Evgeny Bolotin, Mike OConnor, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler, Toggle-Aware Compression for GPUs IEEE Computer Architecture Letters. ,vol. 14, pp. 164- 168 ,(2015) , 10.1109/LCA.2015.2430853

John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach ,(1989)

B J Smith, A pipelined, shared resource MIMD computer Advanced computer architecture. pp. 39- 41 ,(1986)

Wen-mei W. Hwu, David B. Kirk, Programming Massively Parallel Processors: A Hands-on Approach Morgan Kaufmann. ,(2012)

Julien Dusser, Thomas Piquet, André Seznec, Zero-content augmented caches Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09. pp. 46- 55 ,(2009) , 10.1145/1542275.1542288

Jian Huang, D.J. Lilja, Exploiting basic block value locality with block reuse high-performance computer architecture. pp. 106- 114 ,(1999) , 10.1109/HPCA.1999.744342

Alaa R. Alameldeen, David A. Wood, Adaptive Cache Compression for High-Performance Processors ACM SIGARCH Computer Architecture News. ,vol. 32, pp. 212- 223 ,(2004) , 10.1145/1028176.1006719

10.

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator international symposium on performance analysis of systems and software. pp. 163- 174 ,(2009) , 10.1109/ISPASS.2009.4919648

A framework for accelerating bottlenecks in GPU execution with assist warps

来源期刊

我的账户

A framework for accelerating bottlenecks in GPU execution with assist warps

来源期刊

相似文章 5

Scaling applications on cloud using GPGPU- trends and techniques

Techniques for Shared Resource Management in Systems with Throughput Processors

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency

Optimized Password Recovery based on GPUs for SM3 Algorithm

Enhancing Programmability, Portability, and Performance with Rich Cross-Layer Abstractions.

我的账户