Scalar optimizations for shaders

作者: Yuri Dotsenko , Derek Sessions , Andy Glaister , Blaise Pascal Tine , Mikhail Lyapunov

DOI:

关键词:

摘要: Described herein are optimizations of thread loop intermediate representation (IR) code. One embodiment involves an algorithm that, based on data-flow analysis, computes sets temporary variables that loaded at the beginning a and stored upon exit from loop. Another reducing size trip for commonly-found case where piece compute shader is executed by single (or compiler-analyzable range threads). In yet another embodiment, indices cached to avoid excessive divisions, further improving execution speed.

参考文章(13)
Manish Kurhekar, Rajkishore Barik, Pradeep Varma, Compilation of unified parallel C-language programs ,(2003)
Peng Di, Jingling Xue, Model-driven tile size selection for DOACROSS loops on GPUs international conference on parallel processing. pp. 401- 412 ,(2011) , 10.1007/978-3-642-23397-5_40
Vinod Grover, Michael Murphy, Bastiaan Joannes Matheus Aarts, Partitioning CUDA code for execution by a general purpose processor ,(2009)
Naga K. Govindaraju, Yuri Dotsenko, John Manferdelli, Brandon Lloyd, Burton Smith, High performance discrete Fourier transforms on graphics processors ieee international conference on high performance computing data and analytics. pp. 2- ,(2008) , 10.5555/1413370.1413373
Allen Leung, Nicolas Vasilache, Benoît Meister, Muthu Baskaran, David Wohlford, Cédric Bastoul, Richard Lethin, A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction general purpose processing on graphics processing units. ,vol. 425, pp. 51- 61 ,(2010) , 10.1145/1735688.1735698
Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou, A GPGPU compiler for memory optimization and parallelism management Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation - PLDI '10. ,vol. 45, pp. 86- 97 ,(2010) , 10.1145/1806596.1806606
Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture IEEE Micro. ,vol. 28, pp. 39- 55 ,(2008) , 10.1109/MM.2008.31