Balancing and clustering of words in the Burrows-Wheeler transform

作者: Antonio Restivo , Giovanna Rosone

DOI: 10.1016/J.TCS.2010.11.040

关键词:

摘要: Compression algorithms based on Burrows-Wheeler transform (BWT) take advantage of the fact that word output BWT shows a local similarity and then turns out to be highly compressible. The aim present paper is study such ''clustering effect'' by using notions methods from Combinatorics Words. notion balance plays central role in our investigation. Empirical observations suggest actually combinatorial property input ensure optimal compression. Moreover, it reasonable assume more balanced is, we have after (and therefore better compression is). This hypothesis here corroborated experiments ''real'' text, entropy as measure degree word. In setting Words, sound confirmation previous given result Mantaci et al. (2003) [27], which states that, case binary alphabet, there an equivalence between circularly words, words having clusterized BWT, conjugates standard words. alphabets size greater than two, no equivalence. last section devoted investigate relationships these notions, other related ones (as, for instance, palindromic richness) general alphabet.

参考文章(37)
James Stuart Tanton, Encyclopedia of Mathematics ,(2005)
Giovanni Manzini, The Burrows-Wheeler Transform: Theory and Practice mathematical foundations of computer science. pp. 34- 47 ,(1999) , 10.1007/3-540-48340-3_4
Giovanni Manzini, Invited Lecture: The Burrows-Wheeler Transform: Theory and Practice mathematical foundations of computer science. pp. 34- 47 ,(1999)
Antonio Restivo, Giovanna Rosone, Balanced Words Having Simple Burrows-Wheeler Transform Developments in Language Theory. pp. 431- 442 ,(2009) , 10.1007/978-3-642-02737-6_35
Haim Kaplan, Elad Verbin, Most Burrows-Wheeler Based Compressors Are Not Optimal Combinatorial Pattern Matching. pp. 107- 118 ,(2007) , 10.1007/978-3-540-73437-6_13
Michelangelo Bucci, Alessandro De Luca, Amy Glen, Luca Q. Zamboni, A new characteristic property of rich words Theoretical Computer Science. ,vol. 410, pp. 2860- 2863 ,(2009) , 10.1016/J.TCS.2008.11.001
Giovanni Manzini, An analysis of the Burrows—Wheeler transform Journal of the ACM. ,vol. 48, pp. 407- 430 ,(2001) , 10.1145/382780.382782
Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Marinella Sciortino, Boosting textual compression in optimal linear time Journal of the ACM. ,vol. 52, pp. 688- 713 ,(2005) , 10.1145/1082036.1082043