作者: Emmanuel Agullo , Alfredo Buttari , Abdou Guermouche , Florent Lopez
DOI: 10.1109/HIPC.2015.27
关键词:
摘要: Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this paper we first tackle issue data partitioning using a method suited On one hand, design sufficiently large granularity to obtain good acceleration factor on GPU. other limit size in order both fit GPU memory constraints and generate enough parallelism graph. Secondly handle scheduling with strategy capable taking into account workload architecture heterogeneity at reduced cost. Finally propose an original evaluation performance obtained our solver test set matrices. We show proposed approach allows processing extremely input problems GPU-accelerated platforms overall is competitive equivalent state art designed optimized GPU-only use.