Numprof: a performance analysis framework for numerical libraries

作者: Olli-Pekka Lehto

DOI: 10.1007/978-3-642-36803-5_22

关键词:

摘要: This paper introduces Numprof, a profiling framework for performance analysis of numerical libraries. The consists profiler and replayer the BLAS FFTW3 records library call events with user configurable amount detail. can be used to execute calls based on trace files generated by profiler. We explore real-world use cases demonstrate that due its low overhead it is feasible continuous statistical calls.

参考文章(13)
J. Dongarra, J. Demmel, C. Bischof, A. McKenney, Z. Bai, D. Sorensen, A. Greenbaum, E. Anderson, S. Hammarling, J. Du Croz, LAPACK: a portable linear algebra library for high-performance computers conference on high performance computing (supercomputing). pp. 2- 11 ,(1990) , 10.5555/110382.110385
Shirley Browne, Jack Dongarra, Nathan Garner, George Ho, Philip Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors ieee international conference on high performance computing data and analytics. ,vol. 14, pp. 189- 204 ,(2000) , 10.1177/109434200001400303
Jack J. Dongarra, Jeremy Croz and Sven Hammarling and Richard J., Corrigenda: “An Extended Set of FORTRAN Basic Linear Algebra Subprograms” ACM Transactions on Mathematical Software. ,vol. 14, pp. 399- ,(1988) , 10.1145/50063.356256
Philip C. Roth, Characterizing the I/O behavior of scientific applications on the Cray XT Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07. pp. 50- 55 ,(2007) , 10.1145/1374596.1374609
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, Richard J. Hanson, An extended set of FORTRAN basic linear algebra subprograms ACM Transactions on Mathematical Software. ,vol. 14, pp. 1- 17 ,(1988) , 10.1145/42288.42291
Rami G. Melhem, K. V. S. Ramarao, Multicolor reordering of sparse matrices resulting from irregular grids ACM Transactions on Mathematical Software. ,vol. 14, pp. 117- 138 ,(1988) , 10.1145/45054.214373
M. Frigo, S.G. Johnson, The Design and Implementation of FFTW3 Proceedings of the IEEE. ,vol. 93, pp. 216- 231 ,(2005) , 10.1109/JPROC.2004.840301
Susan L Graham, Peter B Kessler, Marshall K McKusick, None, Gprof: A call graph execution profiler compiler construction. ,vol. 39, pp. 120- 126 ,(1982) , 10.1145/800230.806987
Rajib Nath, Stanimire Tomov, Jack Dongarra, Accelerating GPU kernels for dense linear algebra ieee international conference on high performance computing data and analytics. pp. 83- 92 ,(2010) , 10.1007/978-3-642-19328-6_10