Identifying Taxonomic Units in Metagenomic DNA Streams

作者: Vicky Zheng , Ahmet Erdem Sariyuce , Jaroslaw Zola

DOI: 10.1101/2020.08.21.261313

关键词:

摘要: With the emergence of portable DNA sequencers, such as Oxford Nanopore Technology MinION, metagenomic sequencing can be performed in real-time and directly field. However, because analysis is computationally memory intensive, current methods are designed for batch processing, tools not well suited mobile~devices. In this paper, we propose a new memory-efficient method to identify Operational Taxonomic Units (OTUs) streams. Our based on finding connected components overlap graphs constructed over stream long reads produced by MinION platform. We an efficient algorithm maintain when graph streamed, show how redundant information removed from transitive closures. Through experiments simulated real-world data, demonstrate that resulting solution able recover OTUs with high precision while remaining suitable mobile computing devices.

参考文章(31)
Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin, Adam M Phillippy, Assembling large genomes with single-molecule sequencing and locality-sensitive hashing Nature Biotechnology. ,vol. 33, pp. 623- 630 ,(2015) , 10.1038/NBT.3238
Luigi Laura, Federico Santaroni, Computing Strongly Connected Components in the Streaming Model Theory and Practice of Algorithms in (Computer) Systems. ,vol. 6595, pp. 193- 205 ,(2011) , 10.1007/978-3-642-19754-3_20
Robert McColl, Oded Green, David A. Bader, A new parallel algorithm for connected components in dynamic graphs ieee international conference on high performance computing, data, and analytics. pp. 246- 255 ,(2013) , 10.1109/HIPC.2013.6799108
Patrick Flick, Chirag Jain, Tony Pan, Srinivas Aluru, A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications ieee international conference on high performance computing data and analytics. pp. 15- ,(2015) , 10.1145/2807591.2807619
Robert J. Beynon, Computing in the biological sciences--a survey. Bioinformatics. ,vol. 1, pp. 7- 9 ,(1985) , 10.1093/BIOINFORMATICS/1.1.7
Pankaj K. Agarwal, Lars Arge, Ke Yi, I/O-efficient batched union-find and its applications to terrain analysis ACM Transactions on Algorithms. ,vol. 7, pp. 11- ,(2010) , 10.1145/1868237.1868249
S Altschula, Warren Gisha, Webb Millerb, E Meyersc, D Lipmana, None, Basic Local Alignment Search Tool Journal of Molecular Biology. ,vol. 215, pp. 403- 410 ,(1990) , 10.1016/S0022-2836(05)80360-2
Leah Epstein, Rob Van Stee, On the online unit clustering problem ACM Transactions on Algorithms. ,vol. 7, pp. 1- 18 ,(2010) , 10.1145/1868237.1868245
M. Pop, Genome assembly reborn: recent computational challenges Briefings in Bioinformatics. ,vol. 10, pp. 354- 366 ,(2009) , 10.1093/BIB/BBP026
Eugene W Myers, None, The fragment assembly string graph Bioinformatics. ,vol. 21, pp. 79- 85 ,(2005) , 10.1093/BIOINFORMATICS/BTI1114