Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

作者: Emmanuel Agullo , Alfredo Buttari , Abdou Guermouche , Florent Lopez

DOI: 10.1109/HIPC.2015.27

关键词:

摘要: Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this paper we first tackle issue data partitioning using a method suited On one hand, design sufficiently large granularity to obtain good acceleration factor on GPU. other limit size in order both fit GPU memory constraints and generate enough parallelism graph. Secondly handle scheduling with strategy capable taking into account workload architecture heterogeneity at reduced cost. Finally propose an original evaluation performance obtained our solver test set matrices. We show proposed approach allows processing extremely input problems GPU-accelerated platforms overall is competitive equivalent state art designed optimized GPU-only use.

参考文章(27)
George Bosilca, None, Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach Scalable Computing and Communications: Theory and Practice (to appear). ,(2012)
Piyush Sao, Richard Vuduc, Xiaoye Sherry Li, A Distributed CPU-GPU Sparse Direct Solver Lecture Notes in Computer Science. pp. 487- 498 ,(2014) , 10.1007/978-3-319-09873-9_41
Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems international parallel and distributed processing symposium. pp. 156- 165 ,(2015) , 10.1109/IPDPS.2015.56
Everton Hermann, Bruno Raffin, François Faure, Thierry Gautier, Jérémie Allard, Multi-GPU and multi-CPU parallelization for interactive physics simulations european conference on parallel processing. ,vol. 6272, pp. 235- 246 ,(2010) , 10.1007/978-3-642-15291-7_23
Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Florent Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems Euro-Par 2013 Parallel Processing. ,vol. 8097, pp. 521- 532 ,(2013) , 10.1007/978-3-642-40047-6_53
Enrique S. Quintana-Ortí, Gregorio Quintana-Ortí, Jesús Labarta, José R. Herrero, Josep M. Pérez, Rosa M. Badia, Parallelizing dense and banded linear algebra libraries using SMPSs Concurrency and Computation: Practice and Experience. ,vol. 21, pp. 2438- 2456 ,(2009) , 10.1002/CPE.V21:18
Gregorio Quintana-Ortí, Enrique S. Quintana-Ortí, Robert A. Van De Geijn, Field G. Van Zee, Ernie Chan, Programming matrix algorithms-by-blocks for thread-level parallelism ACM Transactions on Mathematical Software. ,vol. 36, pp. 1- 26 ,(2009) , 10.1145/1527286.1527288
G. A. Geist, E. Ng, Task scheduling for parallel sparse Cholesky factorization International Journal of Parallel Programming. ,vol. 18, pp. 291- 314 ,(1990) , 10.1007/BF01407861
Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, Stanimire Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects Journal of Physics: Conference Series. ,vol. 180, pp. 012037- ,(2009) , 10.1088/1742-6596/180/1/012037
Kyungjoo Kim, Victor Eijkhout, A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling ACM Transactions on Mathematical Software. ,vol. 41, pp. 3- ,(2014) , 10.1145/2629641