作者: Chanyoung Oh , Zhen Zheng , Xipeng Shen , Jidong Zhai , Youngmin Yi
关键词:
摘要: Recent studies have shown promising performance benefits when multiple stages of a pipelined stencil application are mapped to different parts GPU run concurrently. An important factor for the computing efficiency such pipelines is granularity task. In previous programming frameworks that support true computations on GPU, choice has be made by programmers during development time. Due many difficulties, programmers' decisions often far from optimal, causing inferior and portability. This paper presents GOPipe, granularity-oblivious framework efficient executions GPU. With no longer need specify appropriate task granularity. GOPipe automatically finds it, dynamically schedules tasks while observing all inter-task inter-stage data dependencies. our experiments six real-life applications various scenarios, outperforms state-of-the-art system 1.39X average with much better productivity.