A new challenge for compression algorithms: genetic sequences

作者: Stéphane Grumbach , Fariza Tahi

DOI: 10.1016/0306-4573(94)90014-0

关键词:

摘要: Universal data compression algorithms fail to compress genetic sequences. It is due the specificity of this particular kind “text.” We analyze in some detail properties sequences, which cause failure classical algorithms. then present a lossless algorithm, biocompress-2, information contained DNA and RNA based on detection regularities, such as presence palindromes. The algorithm combines substitutional statistical methods, best our knowledge, leads highest DNA. results, although not satisfactory, give insight necessary correlation between comprehension

参考文章(18)
Mark Nelson, The Data Compression Book Henry Holt and Co., Inc.. ,(1991)
Alberto Apostolico, Dany Breslauer, Zvi Galil, Optimal Parallel Algorithms for Periods, Palindromes and Squares (Extended Abstract) international colloquium on automata languages and programming. pp. 296- 307 ,(1992) , 10.1007/3-540-55719-9_82
Steve G Oliver, Quirina JM van der Aart, Maria L Agostoni-Carbone, Michel Aigle, Lilia Alberghina, Despina Alexandraki, G Antoine, R Anwar, JPG Ballesta, P Benit, G Berben, Elisabetta Bergantino, N Biteau, PA Bolle, M Bolotin-Fukuhara, A Brown, AJP Brown, JM Buhler, C Carcano, G Carignani, H Cederberg, R Chanet, R Contreras, M Crouzet, B Daignan-Fornier, E Defoor, M Delgado, J Demolder, C Doira, E Dubois, B Dujon, A Dusterhoft, D Erdmann, M Esteban, F Fabre, C Fairhead, G Faye, H Feldmann, W Fiers, MC Francingues-Gaillard, L Franco, L Frontali, H Fukuhara, LJ Fuller, P Galland, ME Gent, D Gigot, V Gilliquet, N Glansdorff, A Goffeau, M Grenson, P Grisanti, LA Grivell, M De Haan, M Haasemann, D Hatat, J Hoenicka, J Hegemann, CJ Herbert, F Hilger, S Hohmann, CP Hollenberg, K Huse, F Iborra, KJ Indje, K Isono, C Jacq, M Jacquet, CM James, JC Jauniaux, Y Jia, A Jimenez, A Kelly, U Kleinhans, P Kreisl, Gerolamo Lanfranchi, C Lewis, CG Vanderlinden, G Lucchini, K Lutzenkirchen, MJ Maat, L Mallet, G Mannhaupet, E Martegani, A Mathieu, CTC Maurer, D McConnell, RA McKee, F Messenguy, HW Mewes, F Molemans, MA Montague, M Muzi Falconi, L Navas, CS Newlon, D Noone, C Pallier, L Panzeri, BM Pearson, J Perea, P Philippsen, A Pierard, RJ Planta, P Plevani, B Poetsch, F Pohl, B Purnelle, M Ramezani Rad, SW Rasmussen, A Raynal, M Remacha, P Richterich, AB Roberts, F Rodriguez, E Sanz, I Schaaff-Gerstenschlager, B Scherens, B Schweitzer, Y Shu, J Skala, PP Slonimski, F Sor, C Soustelle, R Spiegelberg, LI Stateva, HY Steensma, S Steiner, A Thierry, G Thireos, M Tzermia, LA Urrestarazu, Giorgio Valle, I Vetter, JC van Vliet-Reedijk, M Voet, G Volckaert, P Vreken, H Wang, JR Warmington, D Von Wettstein, BL Wicksteed, C Wilson, H Wurst, G Xu, A Yoshikawa, FK Zimmermann, JG Sgouros, None, The complete DNA sequence of yeast chromosome III. Nature. ,vol. 357, pp. 38- 46 ,(1992) , 10.1038/357038A0
James A. Storer, Data compression: methods and theory Computer Science Press, Inc.. ,(1987)
M. Delobel, Robert Robbins, Nabil Kamel, Jean Thierry-Mieg, Akira Tsugita, Thomas G. Marr, Data and Knowledge Bases for Genome Mapping: What Lies Ahead? (Panel) very large data bases. pp. 309- ,(1991)
T. B. L. Kirkwood, M. S. Waterman, Mathematical Methods for DNA Sequences Biometrics. ,vol. 46, pp. 882- ,(1989) , 10.2307/2532117
Welch, A Technique for High-Performance Data Compression IEEE Computer. ,vol. 17, pp. 8- 19 ,(1984) , 10.1109/MC.1984.1659158
Didier G. Arques, Christian J. Michel, Periodicities in coding and noncoding regions of the genes Journal of Theoretical Biology. ,vol. 143, pp. 307- 318 ,(1990) , 10.1016/S0022-5193(05)80032-3
Marc Zipstein, Data compression with factor automata Theoretical Computer Science. ,vol. 92, pp. 213- 221 ,(1992) , 10.1016/0304-3975(92)90144-5