Improving Data-Dependent Parallelism in GPUs Through Programmer-Transparent Architectural Support

作者: Amir Ali Abdolrashidi

DOI:

关键词:

摘要: As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU computational power. Traditionally, GPUs have lacked generalized data-dependent parallelism and synchronization. In recent years, there have been attempts to introduce a more sophisticated form of synchronization between different kernels in an application to control the flow and ensure the correctness of the outputs. However, coarse synchronization between such kernels can significantly reduce GPU utilization. Moreover, with hundreds or thousands of kernels in a workload, the overhead can be consequential. Due to GPU’s massive parallel design, data can be split among thread blocks, which allows us to manage the data dependencies on a more fine-grained level between the thread blocks themselves rather than the kernel containing them. In this dissertation, we propose several methods to …

参考文章(0)