Parallelizing Big De Bruijn Graph Traversal for Genome Assembly on GPU Clusters.

作者: Shuang Qiu , Zonghao Feng , Qiong Luo

DOI: 10.1007/978-3-030-18590-9_68

关键词:

摘要: De Bruijn graph traversal is a critical step in de novo assemblers. It uses the structure to analyze genome sequences and both memory space intensive time consuming. To improve efficiency, we develop ParaGraph, which parallelizes on cluster of GPU-equipped computer nodes. With effective vertex partitioning fine-grained parallel algorithms, ParaGraph utilizes all cores each CPU GPU, CPUs GPUs node, nodes cluster. Our results show that able traverse billion-node graphs within three minutes six an order magnitude faster than state-of-the-art shared based assemblers, more five times current distributed

参考文章(7)
Yang Li, Pegah Kamousi, Fangqiu Han, Shengqi Yang, Xifeng Yan, Subhash Suri, Memory efficient minimum substring partitioning Proceedings of the VLDB Endowment. ,vol. 6, pp. 169- 180 ,(2013) , 10.14778/2535569.2448951
Ruibang Luo, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiaoqian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam, Jun Wang, None, SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience. ,vol. 1, pp. 30- 30 ,(2012) , 10.1186/2047-217X-1-18
Rayan Chikhi, Antoine Limasset, Paul Medvedev, Compacting de Bruijn graphs from sequencing data quickly and in low memory Bioinformatics. ,vol. 32, pp. 201- 208 ,(2016) , 10.1093/BIOINFORMATICS/BTW279
Jintao Meng, Sangmin Seo, Pavan Balaji, Yanjie Wei, Bingqiang Wang, Shenzhong Feng, SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale international conference on parallel processing. pp. 195- 204 ,(2016) , 10.1109/ICPP.2016.29
Shuang Qiu, Qiong Luo, Parallelizing Big De Bruijn Graph Construction on Heterogeneous Processors 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). pp. 1431- 1441 ,(2017) , 10.1109/ICDCS.2017.250
Ilia Minkin, Son Pham, Paul Medvedev, TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes Bioinformatics. ,vol. 33, pp. btw609- 4032 ,(2016) , 10.1093/BIOINFORMATICS/BTW609
Da Yan, Hongzhi Chen, James Cheng, Zhenkun Cai, Bin Shao, Scalable De Novo Genome Assembly Using Pregel 2018 IEEE 34th International Conference on Data Engineering (ICDE). pp. 1216- 1219 ,(2018) , 10.1109/ICDE.2018.00114