Toward multi-target autotuning for accelerators

作者: Nick Chaimov , Boyana Norris , Allen Malony

DOI: 10.1109/PADSW.2014.7097851

关键词: Computer scienceExecution timeCode generationCompilerFactor (programming language)Program optimizationCUDAProgramming paradigmXeon PhiParallel computingProcess (computing)Implementation

摘要: Producing high-performance implementations from simple, portable computation specifications is a challenge that compilers have tried to address for several decades. More …

参考文章(19)
Robert Bell, Allen D. Malony, Sameer Shende, ParaProf : A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis european conference on parallel processing. pp. 17- 26 ,(2003) , 10.1007/978-3-540-45209-6_7
Satish Balay, William D. Gropp, Lois Curfman McInnes, Barry F. Smith, Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries Modern Software Tools for Scientific Computing. pp. 163- 202 ,(1997) , 10.1007/978-1-4612-1986-6_8
Boyana Norris, Albert Hartono, Elizabeth Jessup, Jeremy Siek, Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes Lecture Notes in Computer Science. pp. 248- 258 ,(2009) , 10.1007/978-3-642-01970-8_25
Kevin A. Huck, Allen D. Malony, Sameer Shende, Alan Morris, Knowledge support and automation for performance analysis with PerfExplorer 2.0 Scientific Programming. ,vol. 16, pp. 123- 134 ,(2008) , 10.1155/2008/985194
John E. Stone, David Gohara, Guochun Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems computational science and engineering. ,vol. 12, pp. 66- 73 ,(2010) , 10.1109/MCSE.2010.69
Azamat Mametjanov, Daniel Lowell, Ching-Chen Ma, Boyana Norris, Autotuning Stencil-Based Computations on GPUs international conference on cluster computing. pp. 266- 274 ,(2012) , 10.1109/CLUSTER.2012.46
Daniel Lowell, Jeswin Godwin, Justin Holewinski, Deepan Karthik, Chekuri Choudary, Azamat Mametjanov, Boyana Norris, Gerald Sabin, P Sadayappan, Jason Sarich, None, Stencil-Aware GPU Optimization of Iterative Solvers SIAM Journal on Scientific Computing. ,vol. 35, ,(2013) , 10.1137/120883153
John Nickolls, Ian Buck, Michael Garland, Kevin Skadron, Scalable parallel programming with CUDA ACM SIGGRAPH 2008 classes on - SIGGRAPH '08. ,vol. 6, pp. 40- 53 ,(2008) , 10.1145/1401132.1401152
Nicholas Chaimov, Scott Biersdorff, Allen D Malony, Tools for machine-learning-based empirical autotuning and specialization ieee international conference on high performance computing data and analytics. ,vol. 27, pp. 403- 411 ,(2013) , 10.1177/1094342013493124
K.A. Huck, A.D. Malony, R. Bell, A. Morris, Design and implementation of a parallel performance data management framework international conference on parallel processing. pp. 473- 482 ,(2005) , 10.1109/ICPP.2005.29