SeqCompress: an algorithm for biological sequence compression.

作者: Muhammad Sardaraz , Muhammad Tahir , Ataul Aziz Ikram , Hassan Bajwa , None

DOI: 10.1016/J.YGENO.2014.08.007

关键词:

摘要: The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount data efficiently. Biological sequence storage cost has become a noticeable proportion total in the generation and analysis. Particularly increase DNA sequencing rate is significantly outstripping disk capacity, which may go beyond limit capacity. It essential develop algorithms large sets via better memory management. This article compression algorithm SeqCompress copes with space complexity biological sequences. based on lossless uses statistical model as well arithmetic coding compress proposed compared recent specialized for Experimental results show gain other existing algorithms.

参考文章(24)
Amr A. Sharawi, Nour S. Bakr, DNA Lossless Compression Algorithms: Review American Journal of Bioinformatics Research. ,vol. 3, pp. 72- 81 ,(2013)
Kunihiko Sadakane, Toshiko Matsumoto, Hiroshi Imai, Biological sequence compression algorithms. Genome Informatics. ,vol. 11, pp. 43- 52 ,(2000) , 10.11234/GI1990.11.43
Behshad Behzadi, Fabrice Le Fessant, DNA Compression Challenge Revisited: A Dynamic Programming Approach Combinatorial Pattern Matching. pp. 190- 200 ,(2005) , 10.1007/11496656_17
Trevor I. Dix, Timothy Edgoose, Lloyd Allison, Compression of Strings with Approximate Repeats intelligent systems in molecular biology. ,vol. 6, pp. 8- 16 ,(1998)
Akihiko Konagaya, Hisahiko Sato, Takashi Yoshioka, Tetsuro Toyoda, DNA Data Compression in the Post Genome Era Genome Informatics. ,vol. 12, pp. 512- 514 ,(2001) , 10.11234/GI1990.12.512
Tungadri Bose, Monzoorul Haque Mohammed, Anirban Dutta, Sharmila S Mande, BIND - an algorithm for loss-less compression of nucleotide sequence data. Journal of Biosciences. ,vol. 37, pp. 785- 789 ,(2012) , 10.1007/S12038-012-9230-6
J. Craig Venter, Multiple personal genomes await Nature. ,vol. 464, pp. 676- 677 ,(2010) , 10.1038/464676A
Eric S Lander, None, Initial impact of the sequencing of the human genome Nature. ,vol. 470, pp. 187- 197 ,(2011) , 10.1038/NATURE09792
G. Korodi, I. Tabus, J. Rissanen, J. Astola, DNA sequence compression - Based on the normalized maximum likelihood model IEEE Signal Processing Magazine. ,vol. 24, pp. 47- 53 ,(2007) , 10.1109/MSP.2007.273055
Lincoln D Stein, The case for cloud computing in genome informatics Genome Biology. ,vol. 11, pp. 207- 207 ,(2010) , 10.1186/GB-2010-11-5-207