作者: Zheng Wang , Daniel Powell , Björn Franke , Michael O’Boyle
DOI: 10.1007/978-3-642-54807-9_9
关键词:
摘要: General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy exploit for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism generates parallel OpenCl code from programs. We fact that profiling provides us with loop candidates highly likely be genuinely parallel, cannot statically proven so. For efficient Gpu those probably candidates, propose novel software speculation scheme, which ensures correctness unlikely, yet possible case dynamically detected violations. Our scheme operates in place supports speculative read write operations. demonstrate effectiveness approach detecting exploiting using codes Nas benchmark suite. achieve an average speedup 3.2x, up 99x, over baseline. On average, is 1.42 times faster than state-of-the-art schemes corresponds 99% performance level manual implementation developed by independent expert programmers.