Compression of FASTQ and SAM format sequencing data.

作者: James K. Bonfield , Matthew V. Mahoney

DOI: 10.1371/JOURNAL.PONE.0059190

关键词:

摘要: … The only other code is N, which need not be coded because it always has a quality score of 0 and can be inserted during decoding. We pack either 3 or 4 bases together, whichever …

参考文章(38)
Ben Langmead, Aligning Short Sequencing Reads with Bowtie Current protocols in human genetics. ,vol. 32, ,(2010) , 10.1002/0471250953.BI1107S32
P. Deutsch, J.-L. Gailly, ZLIB Compressed Data Format Specification version 3.3 RFC. ,vol. 1950, pp. 1- 11 ,(1996)
Dinesh Bharadia, Himanshu Asnani, Tsachy Weissman, Idoia Ochoa, Mainak Chowdhury, Itai Sharon, Lossy Compression of Quality Values via Rate Distortion Theory arXiv: Genomics. ,(2012)
A. Kolmogorov, Logical basis for information theory and probability theory IEEE Transactions on Information Theory. ,vol. 14, pp. 662- 664 ,(1968) , 10.1109/TIT.1968.1054210
Armando J. Pinho, Diogo Pratas, Sara P. Garcia, GReEn: a tool for efficient compression of genome resequencing data Nucleic Acids Research. ,vol. 40, ,(2012) , 10.1093/NAR/GKR1124
Niko Popitsch, Arndt von Haeseler, NGC: lossless and lossy compression of aligned high-throughput sequencing data Nucleic Acids Research. ,vol. 41, ,(2013) , 10.1093/NAR/GKS939
David R Bentley, Shankar Balasubramanian, Harold P Swerdlow, Geoffrey P Smith, John Milton, Clive G Brown, Kevin P Hall, Dirk J Evers, Colin L Barnes, Helen R Bignell, Jonathan M Boutell, Jason Bryant, Richard J Carter, R Keira Cheetham, Anthony J Cox, Darren J Ellis, Michael R Flatbush, Niall A Gormley, Sean J Humphray, Leslie J Irving, Mirian S Karbelashvili, Scott M Kirk, Heng Li, Xiaohai Liu, Klaus S Maisinger, Lisa J Murray, Bojan Obradovic, Tobias Ost, Michael L Parkinson, Mark R Pratt, Isabelle MJ Rasolonjatovo, Mark T Reed, Roberto Rigatti, Chiara Rodighiero, Mark T Ross, Andrea Sabot, Subramanian V Sankar, Aylwyn Scally, Gary P Schroth, Mark E Smith, Vincent P Smith, Anastassia Spiridou, Peta E Torrance, Svilen S Tzonev, Eric H Vermaas, Klaudia Walter, Xiaolin Wu, Lu Zhang, Mohammed D Alam, Carole Anastasi, Ify C Aniebo, David MD Bailey, Iain R Bancarz, Saibal Banerjee, Selena G Barbour, Primo A Baybayan, Vincent A Benoit, Kevin F Benson, Claire Bevis, Phillip J Black, Asha Boodhun, Joe S Brennan, John A Bridgham, Rob C Brown, Andrew A Brown, Dale H Buermann, Abass A Bundu, James C Burrows, Nigel P Carter, Nestor Castillo, Maria Chiara E. Catenazzi, Simon Chang, R Neil Cooley, Natasha R Crake, Olubunmi O Dada, Konstantinos D Diakoumakos, Belen Dominguez-Fernandez, David J Earnshaw, Ugonna C Egbujor, David W Elmore, Sergey S Etchin, Mark R Ewan, Milan Fedurco, Louise J Fraser, Karin V Fuentes Fajardo, W Scott Furey, David George, Kimberley J Gietzen, Colin P Goddard, George S Golda, Philip A Granieri, David E Green, David L Gustafson, Nancy F Hansen, Kevin Harnish, Christian D Haudenschild, Narinder I Heyer, Matthew M Hims, Johnny T Ho, Adrian M Horgan, Katya Hoschler, Steve Hurwitz, Denis V Ivanov, Maria Q Johnson, Terena James, TA Huw Jones, Gyoung-Dong Kang, Tzvetana H Kerelska, Alan D Kersey, Irina Khrebtukova, Alex P Kindwall, Zoya Kingsbury, Paula I Kokko-Gonzales, Anil Kumar, Marc A Laurent, Cynthia T Lawley, Sarah E Lee, Xavier Lee, Arnold K Liao, Jennifer A Loch, Mitch Lok, Shujun Luo, Radhika M Mammen, John W Martin, Patrick G McCauley, Paul McNitt, Parul Mehta, Keith W Moon, Joe W Mullens, Taksina Newington, Zemin Ning, Bee Ling Ng, Sonia M Novo, Michael J O’Neill, Mark A Osborne, Andrew Osnowski, Omead Ostadan, Lambros L Paraschos, Lea Pickering, Andrew C Pike, Alger C Pike, D Chris Pinkard, Daniel P Pliskin, Joe Podhasky, Victor J Quijano, Come Raczy, Vicki H Rae, Stephen R Rawlings, Ana Chiva Rodriguez, Phyllida M Roe, None, Accurate whole human genome sequencing using reversible terminator chemistry Nature. ,vol. 456, pp. 53- 59 ,(2008) , 10.1038/NATURE07517
Muhammad Nazmus Sakib, Jijun Tang, W. Jim Zheng, Chin-Tser Huang, Improving Transmission Efficiency of Large Sequence Alignment/Map (SAM) Files PLoS ONE. ,vol. 6, pp. e28251- ,(2011) , 10.1371/JOURNAL.PONE.0028251
Claude E. Shannon, Warren Weaver, Norbert Wiener, The Mathematical Theory of Communication Physics Today. ,vol. 3, pp. 31- 32 ,(1950) , 10.1063/1.3067010
Christos Kozanitis, Chris Saunders, Semyon Kruglyak, Vineet Bafna, George Varghese, Compressing genomic sequence fragments using SlimGene. Journal of Computational Biology. ,vol. 18, pp. 401- 413 ,(2011) , 10.1089/CMB.2010.0253