作者: Todd C. Mowry , Gennady Pekhimenko , Adwait Jog , Rachata Ausavarungnirun , Chita R. Das
DOI: 10.1016/B978-0-12-803738-6.00015-X
关键词:
摘要: Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands threads. Unfortunately, different bottlenecks during and heterogeneous application requirements create imbalances in utilization resources cores. For example, when a GPU is bottlenecked by available off-chip memory bandwidth, its computational often overwhelmingly idle, waiting for data from arrive.