Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

作者： Emmanuel Agullo , Alfredo Buttari , Abdou Guermouche , Florent Lopez

DOI: 10.1109/HIPC.2015.27

关键词:

摘要: Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this paper we first tackle issue data partitioning using a method suited On one hand, design sufficiently large granularity to obtain good acceleration factor on GPU. other limit size in order both fit GPU memory constraints and generate enough parallelism graph. Secondly handle scheduling with strategy capable taking into account workload architecture heterogeneity at reduced cost. Finally propose an original evaluation performance obtained our solver test set matrices. We show proposed approach allows processing extremely input problems GPU-accelerated platforms overall is competitive equivalent state art designed optimized GPU-only use.

参考文章(27)

George Bosilca, None, Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach Scalable Computing and Communications: Theory and Practice (to appear). ,(2012)

Piyush Sao, Richard Vuduc, Xiaoye Sherry Li, A Distributed CPU-GPU Sparse Direct Solver Lecture Notes in Computer Science. pp. 487- 498 ,(2014) , 10.1007/978-3-319-09873-9_41

Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems international parallel and distributed processing symposium. pp. 156- 165 ,(2015) , 10.1109/IPDPS.2015.56

Everton Hermann, Bruno Raffin, François Faure, Thierry Gautier, Jérémie Allard, Multi-GPU and multi-CPU parallelization for interactive physics simulations european conference on parallel processing. ,vol. 6272, pp. 235- 246 ,(2010) , 10.1007/978-3-642-15291-7_23

Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Florent Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems Euro-Par 2013 Parallel Processing. ,vol. 8097, pp. 521- 532 ,(2013) , 10.1007/978-3-642-40047-6_53

Enrique S. Quintana-Ortí, Gregorio Quintana-Ortí, Jesús Labarta, José R. Herrero, Josep M. Pérez, Rosa M. Badia, Parallelizing dense and banded linear algebra libraries using SMPSs Concurrency and Computation: Practice and Experience. ,vol. 21, pp. 2438- 2456 ,(2009) , 10.1002/CPE.V21:18

Gregorio Quintana-Ortí, Enrique S. Quintana-Ortí, Robert A. Van De Geijn, Field G. Van Zee, Ernie Chan, Programming matrix algorithms-by-blocks for thread-level parallelism ACM Transactions on Mathematical Software. ,vol. 36, pp. 1- 26 ,(2009) , 10.1145/1527286.1527288

G. A. Geist, E. Ng, Task scheduling for parallel sparse Cholesky factorization International Journal of Parallel Programming. ,vol. 18, pp. 291- 314 ,(1990) , 10.1007/BF01407861

Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, Stanimire Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects Journal of Physics: Conference Series. ,vol. 180, pp. 012037- ,(2009) , 10.1088/1742-6596/180/1/012037

10.

Kyungjoo Kim, Victor Eijkhout, A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling ACM Transactions on Mathematical Software. ,vol. 41, pp. 3- ,(2014) , 10.1145/2629641

Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

来源期刊

我的账户

Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

来源期刊

相似文章 7

Partitioning GPUs for Improved Scalability

ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems

A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters

Pre-exascale Architectures: OpenPOWER Performance and Usability Assessment for French Scientific Community

Impact study of data locality on task-based applications through the Heteroprio scheduler

Improving parallel executions by increasing task granularity in task-based runtime systems using acyclic DAG clustering.

Automatic task-based parallelization of C++ applications by source-to-source transformations

我的账户