Assessing OpenMP tasking implementations on NUMA architectures

作者: Christian Terboven , Dirk Schmidl , Tim Cramer , Dieter an Mey

DOI: 10.1007/978-3-642-30961-8_14

关键词:

摘要: The introduction of task-level parallelization promises to raise the level abstraction compared thread-centric expression parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be maintained. In contrast traditional OpenMP worksharing constructs for which threads can bound, behavior is much less predetermined by specification and implementations have a high degree freedom implementing task scheduling. Employing different approaches express task-parallelism, namely single-producer parallel-producer patterns with data initialization strategies, we compare quality task-parallel codes architectures. For programmer, propose recipies parallelism allowing preserve while optimizing Our proposals are evaluated reasonably large both important application kernels as well real-world simulation code.

参考文章(14)
J. M. Bull, James Clerk Maxwell, Measuring Synchronisation and Scheduling Overheads in OpenMP ,(2007)
James LaGrone, Ayodunni Aribuki, Cody Addison, Barbara Chapman, A Runtime Implementation of OpenMP Tasks OpenMP in the Petascale Era. pp. 165- 178 ,(2011) , 10.1007/978-3-642-21487-5_13
Thomas Deselaers, Daniel Keysers, Hermann Ney, Features for image retrieval: an experimental comparison Information Retrieval. ,vol. 11, pp. 77- 107 ,(2008) , 10.1007/S10791-007-9039-3
Timothy A. Davis, Yifan Hu, The university of Florida sparse matrix collection ACM Transactions on Mathematical Software. ,vol. 38, pp. 1- 25 ,(2011) , 10.1145/2049662.2049663
Xavier Teruel, Xavier Martorell, Alejandro Duran, Roger Ferrer, Eduard Ayguadé, Support for OpenMP tasks in Nanos v4 conference of the centre for advanced studies on collaborative research. pp. 256- 259 ,(2007) , 10.1145/1321211.1321241
François Broquedis, Nathalie Furmento, Brice Goglin, Pierre-André Wacrenier, Raymond Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures International Journal of Parallel Programming. ,vol. 38, pp. 418- 439 ,(2010) , 10.1007/S10766-010-0136-3
E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Yuan Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, Guansong Zhang, The Design of OpenMP Tasks IEEE Transactions on Parallel and Distributed Systems. ,vol. 20, pp. 404- 418 ,(2009) , 10.1109/TPDS.2008.105
Stephen L. Olivier, Allan K. Porterfield, Kyle B. Wheeler, Jan F. Prins, Scheduling task parallelism on multi-socket multicore systems Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '11. pp. 49- 56 ,(2011) , 10.1145/1988796.1988804
Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Martorell, Eduard Ayguade, Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP 2009 International Conference on Parallel Processing. pp. 124- 131 ,(2009) , 10.1109/ICPP.2009.64
Andreas Gerndt, Samuel Sarholz, Marc Wolter, Dieter an Mey, Christian Bischof, Torsten Kuhlen, None, Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets conference on high performance computing (supercomputing). pp. 93- ,(2006) , 10.1145/1188455.1188553