作者: Nils Homer , Barry Merriman , Stanley F. Nelson
DOI: 10.1371/JOURNAL.PONE.0007767
关键词:
摘要: Background The new generation of massively parallel DNA sequencers, combined with the challenge whole human genome resequencing, result in need for rapid and accurate alignment billions short sequence reads to a large reference genome. Speed is obviously great importance, but equally important maintaining accuracy reads, 25–100 base range, presence errors true biological variation. Methodology We introduce algorithm specifically optimized this task, as well freely available implementation, BFAST, which can align data produced by any current sequencing platforms, allows user-customizable levels speed accuracy, supports paired end data, provides efficient multi-threaded computation on computer cluster. The method based creating flexible, indexes rapidly map candidate locations, arbitrary multiple independent allowed achieve robustness against read variants. final local uses Smith-Waterman method, gaps support detection small indels. Conclusions compare BFAST selection large-scale tools - BLAT, MAQ, SHRiMP, SOAP terms both using simulated real-world datasets. We show substantially greater sensitivity context variants, especially insertions deletions, minimize false mappings, while adequate compared other methods. amount needed fully resequence genome, one billion high modest cluster less than 24 hours. at http://bfast.sourceforge.net.