Importance of explicit vectorization for CPU and GPU software performance

作者: Neil G. Dickson , Kamran Karimi , Firas Hamze

DOI: 10.1016/J.JCP.2011.03.041

关键词: Computational scienceGraphics processing unitCPU shieldingCentral processing unitSpeedupSoftware performance testingVectorization (mathematics)Parallel computingComputer scienceCPU modesCUDA

摘要: … the CPU and the GPU implementations. Section 3 shows how different parts of the code were vectorized. This section also explains how memory coalescing for GPU was performed. …

参考文章(29)
Donald E. Knuth, The art of computer programming, volume 3: (2nd ed.) sorting and searching Addison Wesley Longman Publishing Co., Inc.. ,(1998)
Babak Bagheri, L. Ridgway Scott, Terry Clark, Scientific Parallel Computing ,(2005)
Hossein Ahmadi, Maryam Moslemi Naeini, Hamid Sarbazi-azad, Efficient SIMD Numerical Interpolation High Performance Computing and Communications. pp. 156- 165 ,(2005) , 10.1007/11557654_21
Donald Ervin Knuth, Sorting and Searching ,(1973)
George Marsaglia, Wai Wan Tsang, Jingbo Wang, Evaluating Kolmogorov's distribution Journal of Statistical Software. ,vol. 8, pp. 1- 4 ,(2003) , 10.18637/JSS.V008.I18
Kamran Karimi, Neil G Dickson, Firas Hamze, Mohammad HS Amin, Marshall Drew-Brook, Fabian A Chudak, Paul I Bunyk, William G Macready, Geordie Rose, Investigating the performance of an adiabatic quantum optimization processor Quantum Information Processing. ,vol. 11, pp. 77- 88 ,(2012) , 10.1007/S11128-011-0235-0
Wen-mei W. Hwu, David B. Kirk, Programming Massively Parallel Processors: A Hands-on Approach Morgan Kaufmann. ,(2012)