SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

作者: José M Abuín , Juan C Pichel , Tomás F Pena , Jorge Amigo , None

DOI: 10.1371/JOURNAL.PONE.0155461

关键词:

摘要: Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need be analyzed and interpreted. This fact has impact on the DNA sequence alignment process, which nowadays requires mapping billions small sequences onto reference genome. In this way, remains most time-consuming stage in analysis workflow. To deal with issue, state art aligners take advantage parallelization strategies. However, existent solutions show limited scalability complex implementation. work we introduce SparkBWA, new tool exploits capabilities big technology as Spark boost performance one widely adopted aligner, Burrows-Wheeler Aligner (BWA). The design SparkBWA uses two independent software layers such way no modifications original BWA source code are required, assures its compatibility any version (future or legacy). is evaluated different scenarios showing noticeable results terms scalability. A comparison other parallel BWA-based validates benefits our approach. Finally, an intuitive flexible API provided NGS professionals order facilitate acceptance adoption tool. described paper publicly available at https://github.com/citiususc/SparkBWA, GPL3 license.

参考文章(28)
J. Arram, K. H. Tsoi, Wayne Luk, P. Jiang, Hardware acceleration of genetic sequence alignment applied reconfigurable computing. pp. 13- 24 ,(2013) , 10.1007/978-3-642-36812-7_2
Yingbo Cui, Xiangke Liao, Xiaoqian Zhu, Bingqiang Wang, Shaoliang Peng, mBWA: A Massively Parallel Sequence Reads Aligner Advances in Intelligent Systems and Computing. pp. 113- 120 ,(2014) , 10.1007/978-3-319-07581-5_14
Guoguang Zhao, Cheng Ling, Donghong Sun, SparkSW: scalable distributed computing system for large-scale biological sequence alignment ieee acm international symposium cluster cloud and grid computing. pp. 845- 852 ,(2015) , 10.1109/CCGRID.2015.55
Ruibang Luo, Jeanno Cheung, Edward Wu, Heng Wang, Sze-Hang Chan, Wai-Chun Law, Guangzhu He, Chang Yu, Chi-Man Liu, Dazong Zhou, Yingrui Li, Ruiqiang Li, Jun Wang, Xiaoqian Zhu, Shaoliang Peng, Tak-Wah Lam, MICA: a fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC) BMC Bioinformatics. ,vol. 16, pp. 1- 8 ,(2015) , 10.1186/1471-2105-16-S7-S10
José M Abuín, Juan C Pichel, Tomás F Pena, Jorge Amigo, None, BigBWA: approaching the Burrows-Wheeler aligner to Big Data technologies. Bioinformatics. ,vol. 31, pp. 4003- 4005 ,(2015) , 10.1093/BIOINFORMATICS/BTV506
Satish Narayana Srirama, Pelle Jakovits, Eero Vainikko, Adapting scientific computing problems to clouds using MapReduce Future Generation Computer Systems. ,vol. 28, pp. 184- 192 ,(2012) , 10.1016/J.FUTURE.2011.05.025
Ruibang Luo, Thomas Wong, Jianqiao Zhu, Chi-Man Liu, Xiaoqian Zhu, Edward Wu, Lap-Kei Lee, Haoxiang Lin, Wenjuan Zhu, David W. Cheung, Hing-Fung Ting, Siu-Ming Yiu, Shaoliang Peng, Chang Yu, Yingrui Li, Ruiqiang Li, Tak-Wah Lam, SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner PLoS ONE. ,vol. 8, pp. e65632- 11 ,(2013) , 10.1371/JOURNAL.PONE.0065632
Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, Alejandro Abdelnur, Oozie Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies - SWEET '12. pp. 4- ,(2012) , 10.1145/2443416.2443420
Simone Leo, Gianluigi Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop high performance distributed computing. pp. 819- 825 ,(2010) , 10.1145/1851476.1851594
Petr Klus, Simon Lam, Dag Lyberg, Ming Cheung, Graham Pullan, Ian McFarlane, Giles SH Yeo, Brian YH Lam, BarraCUDA - a fast short read sequence aligner using graphics processing units BMC Research Notes. ,vol. 5, pp. 27- 27 ,(2012) , 10.1186/1756-0500-5-27