GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU

作者: Chanyoung Oh , Zhen Zheng , Xipeng Shen , Jidong Zhai , Youngmin Yi

DOI: 10.1145/3410463.3414656

关键词:

摘要: Recent studies have shown promising performance benefits when multiple stages of a pipelined stencil application are mapped to different parts GPU run concurrently. An important factor for the computing efficiency such pipelines is granularity task. In previous programming frameworks that support true computations on GPU, choice has be made by programmers during development time. Due many difficulties, programmers' decisions often far from optimal, causing inferior and portability. This paper presents GOPipe, granularity-oblivious framework efficient executions GPU. With no longer need specify appropriate task granularity. GOPipe automatically finds it, dynamically schedules tasks while observing all inter-task inter-stage data dependencies. our experiments six real-life applications various scenarios, outperforms state-of-the-art system 1.39X average with much better productivity.

参考文章(34)
Edward H. Adelson, Peter J. Burt, Charles H. Anderson, James R. Bergen, Joan M. Ogden, PYRAMID METHODS IN IMAGE PROCESSING. RCA engineer. ,vol. 29, pp. 33- 41 ,(1984)
Chanyoung Oh, Saehanseul Yi, Youngmin Yi, Real-time face detection in Full HD images exploiting both embedded CPU and GPU international conference on multimedia and expo. pp. 1- 6 ,(2015) , 10.1109/ICME.2015.7177522
Alberto Magni, Christophe Dubach, Michael O'Boyle, Automatic optimization of thread-coarsening for graphics processors international conference on parallel architectures and compilation techniques. pp. 455- 466 ,(2014) , 10.1145/2628071.2628087
Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, Dieter Schmalstieg, Whippletree: task-based scheduling of dynamic workloads on the GPU international conference on computer graphics and interactive techniques. ,vol. 33, pp. 228- ,(2014) , 10.1145/2661229.2661250
Stanley Tzeng, Brandon Lloyd, John D. Owens, A GPU Task-Parallel Model with Dependency Resolution IEEE Computer. ,vol. 45, pp. 34- 41 ,(2012) , 10.1109/MC.2012.255
Antoniu Pop, Albert Cohen, OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs high performance embedded architectures and compilers. ,vol. 9, pp. 53- ,(2013) , 10.1145/2400682.2400712
M.J. McDonnell, Box-filtering techniques Computer Graphics and Image Processing. ,vol. 17, pp. 65- 70 ,(1981) , 10.1016/S0146-664X(81)80009-3
Robert Ricci, Weibin Sun, Fast and flexible: parallel packet processing with GPUs and click architectures for networking and communications systems. pp. 25- 36 ,(2013) , 10.5555/2537857.2537861
Anjul Patney, Stanley Tzeng, Kerry A. Seitz, John D. Owens, Piko: a framework for authoring programmable graphics pipelines international conference on computer graphics and interactive techniques. ,vol. 34, pp. 147- ,(2015) , 10.1145/2766973
Timo Aila, Samuli Laine, Understanding the efficiency of ray traversal on GPUs Proceedings of the 1st ACM conference on High Performance Graphics - HPG '09. pp. 145- 149 ,(2009) , 10.1145/1572769.1572792