High-throughput DNA sequence data compression

作者: Z. Zhu , Y. Zhang , Z. Ji , S. He , X. Yang

DOI: 10.1093/BIB/BBT087

关键词:

摘要: … The transformed quality scores are further compressed using … non-uniform quantization to bin quality scores, where smaller … the compression of reads and quality scores separately from …

参考文章(80)
S. Golomb, Run-length encodings (Corresp.) IEEE Transactions on Information Theory. ,vol. 12, pp. 399- 401 ,(1966) , 10.1109/TIT.1966.1053907
Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, Peter M. Rice, The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Nucleic Acids Research. ,vol. 38, pp. 1767- 1771 ,(2010) , 10.1093/NAR/GKP1137
Michael L. Metzker, Sequencing technologies — the next generation Nature Reviews Genetics. ,vol. 11, pp. 31- 46 ,(2010) , 10.1038/NRG2626
Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology. ,vol. 10, pp. 1- 10 ,(2009) , 10.1186/GB-2009-10-3-R25
S.C. Sahinalp, U. Vishkin, Efficient approximate and dynamic matching of patterns using a labeling paradigm foundations of computer science. pp. 320- 328 ,(1996) , 10.1109/SFCS.1996.548491
Pierre Baldi, Ryan W. Benz, Daniel S. Hirschberg, S. Joshua Swamidass, Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval. Journal of Chemical Information and Modeling. ,vol. 47, pp. 2098- 2109 ,(2007) , 10.1021/CI700200N
Minh Duc Cao, Trevor I. Dix, Lloyd Allison, Chris Mears, A Simple Statistical Algorithm for Biological Sequence Compression data compression conference. pp. 43- 52 ,(2007) , 10.1109/DCC.2007.7
Marcel Margulies, Michael Egholm, William E Altman, Said Attiya, Joel S Bader, Lisa A Bemben, Jan Berka, Michael S Braverman, Yi-Ju Chen, Zhoutao Chen, Scott B Dewell, Lei Du, Joseph M Fierro, Xavier V Gomes, Brian C Godwin, Wen He, Scott Helgesen, Chun He Ho, Gerard P Irzyk, Szilveszter C Jando, Maria LI Alenquer, Thomas P Jarvie, Kshama B Jirage, Jong-Bum Kim, James R Knight, Janna R Lanza, John H Leamon, Steven M Lefkowitz, Ming Lei, Jing Li, Kenton L Lohman, Hong Lu, Vinod B Makhijani, Keith E McDade, Michael P McKenna, Eugene W Myers, Elizabeth Nickerson, John R Nobile, Ramona Plant, Bernard P Puc, Michael T Ronan, George T Roth, Gary J Sarkis, Jan Fredrik Simons, John W Simpson, Maithreyan Srinivasan, Karrie R Tartaro, Alexander Tomasz, Kari A Vogt, Greg A Volkmer, Shally H Wang, Yong Wang, Michael P Weiner, Pengguang Yu, Richard F Begley, Jonathan M Rothberg, None, Genome sequencing in microfabricated high-density picolitre reactors Nature. ,vol. 437, pp. 376- 380 ,(2005) , 10.1038/NATURE03959
P. Elias, Universal codeword sets and representations of the integers IEEE Transactions on Information Theory. ,vol. 21, pp. 194- 203 ,(1975) , 10.1109/TIT.1975.1055349
A. J. Cox, M. J. Bauer, T. Jakobi, G. Rosone, Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform Bioinformatics. ,vol. 28, pp. 1415- 1419 ,(2012) , 10.1093/BIOINFORMATICS/BTS173