A Performance and Energy Consumption Analytical Model for GPU

作者: Cheng Luo , Reiji Suda

DOI: 10.1109/DASC.2011.117

关键词:

摘要: Even with a powerful hardware in parallel execution, it is still difficult to improve the application performance and reduce energy consumption without realizing bottlenecks of programs on GPU architectures. To help programmers have better insight into energy-saving bottleneck applications architectures, we propose two models: an execution time prediction model model. The model(ETPM) can estimate massively which take instruction-level thread-level parallelism consideration. ETPM contains components: memory sub-model computation sub-model. estimating cost instructions by considering number active threads bandwidth. Correspondingly, application's arithmetic intensity. We use ocelot analysis PTX codes obtain several input parameters for sub-models such as transaction data size. Basing sub-models, analytical estimates each instruction while parallelism, thereby overall application. model(ECPM) total basing from ETPM. compare outcome models actual GTX260 Tesla C2050. results show that reach almost 90 percentage accuracy average benchmarks used.

参考文章(9)
Reiji Suda, Luo Cheng, An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness 研究報告ハイパフォーマンスコンピューティング(HPC). ,vol. 2011, pp. 1- 9 ,(2011)
Phillip B. Gibbons, Yossi Matias, Vijaya Ramachandran, The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms SIAM Journal on Computing. ,vol. 28, pp. 733- 769 ,(1999) , 10.1137/S009753979427491
Jiayuan Meng, Kevin Skadron, Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09. pp. 256- 265 ,(2009) , 10.1145/1542275.1542313
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, Thorsten von Eicken, LogP: towards a realistic model of parallel computation Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93. ,vol. 28, pp. 1- 12 ,(1993) , 10.1145/155332.155333
Michael D. Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng, Merge Proceedings of the 13th international conference on Architectural support for programming languages and operating systems - ASPLOS XIII. ,vol. 42, pp. 287- 296 ,(2008) , 10.1145/1346281.1346318
Dana Schaa, David Kaeli, Exploring the multiple-GPU design space international parallel and distributed processing symposium. pp. 1- 12 ,(2009) , 10.1109/IPDPS.2009.5161068
Reiji Suda, Da Qi Ren, Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing parallel and distributed computing: applications and technologies. pp. 432- 438 ,(2009) , 10.1109/PDCAT.2009.65
Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, Sara S. Baghsorkhi, Sain-Zee Ueng, John A. Stratton, Wen-mei W. Hwu, Program optimization space pruning for a multithreaded gpu symposium on code generation and optimization. pp. 195- 204 ,(2008) , 10.1145/1356058.1356084
Sunpyo Hong, Hyesoon Kim, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09. ,vol. 37, pp. 152- 163 ,(2009) , 10.1145/1555754.1555775