A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme

作者: Deloula Mansouri , Xiaohui Yuan , Abdeldjalil Saidani

DOI: 10.3390/A13040099

关键词:

摘要: With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing specifically storing these large amounts become major scientific challenge bioinformatics. Therefore, to overcome challenges, compression has necessary. In this paper, we describe new reference-free compressor abbreviated as DNAC-SBE. DNAC-SBE lossless hybrid that consists three phases. First, starting from largest base (Bi), positions each Bi are replaced with ones other bases have smaller frequencies than zeros. Second, encode generated streams, propose single-block encoding scheme (SEB) based on exploitation position neighboring bits within block using two different techniques. Finally, proposed algorithm dynamically assigns shorter length code block. Results show outperforms state-of-the-art compressors proves its efficiency terms special conditions imposed compressed data, space transfer rate regardless file format or size data.

参考文章(32)
Stéphane Grumbach, Fariza Tahi, A new challenge for compression algorithms: genetic sequences Information Processing and Management. ,vol. 30, pp. 875- 886 ,(1994) , 10.1016/0306-4573(94)90014-0
Pothuraju Rajarajeswari, Allam Apparao, None, DNABIT Compress - Genome compression algorithm. Bioinformation. ,vol. 5, pp. 350- 360 ,(2011) , 10.6026/97320630005350
Tungadri Bose, Monzoorul Haque Mohammed, Anirban Dutta, Sharmila S Mande, BIND - an algorithm for loss-less compression of nucleotide sequence data. Journal of Biosciences. ,vol. 37, pp. 785- 789 ,(2012) , 10.1007/S12038-012-9230-6
Pinghao Li, Shuang Wang, Jihoon Kim, Hongkai Xiong, Lucila Ohno-Machado, Xiaoqian Jiang, DNA-COMPACT: DNA COMpression based on a pattern-aware contextual modeling technique. PLOS ONE. ,vol. 8, ,(2013) , 10.1371/JOURNAL.PONE.0080377
Muhammad Sardaraz, Muhammad Tahir, Ataul Aziz Ikram, Hassan Bajwa, None, SeqCompress: an algorithm for biological sequence compression. Genomics. ,vol. 104, pp. 225- 228 ,(2014) , 10.1016/J.YGENO.2014.08.007
Subhankar Roy, Sunirmal Khatua, DNA DATA COMPRESSION ALGORITHMS BASED ON REDUNDANCY foundations of computer science. ,vol. 4, pp. 49- 58 ,(2014) , 10.5121/IJFCST.2014.4605
Walid Aly, Basheer Yousif, Bassem Zohdy, A DEOXYRIBONUCLEIC ACID COMPRESSION ALGORITHM USING AUTO-REGRESSION AND SWARM INTELLIGENCE Journal of Computer Science. ,vol. 9, pp. 690- 698 ,(2013) , 10.3844/JCSSP.2013.690.698
Gergely Korodi, Ioan Tabus, An efficient normalized maximum likelihood algorithm for DNA sequence compression ACM Transactions on Information Systems. ,vol. 23, pp. 3- 34 ,(2005) , 10.1145/1055709.1055711
James K. Bonfield, Matthew V. Mahoney, Compression of FASTQ and SAM format sequencing data. PLOS ONE. ,vol. 8, ,(2013) , 10.1371/JOURNAL.PONE.0059190
Kamta Nath Mishra, Dr. Anupam Aaggarwal, Dr. Edries Abdelhadi, Dr. Prakash C. Srivastava, An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression International Journal of Computer Applications. ,vol. 3, pp. 39- 46 ,(2010) , 10.5120/757-954