Compression and Explanation using Hierarchical Grammars

作者: Craig G Nevill-Manning , Ian H Witten

DOI: 10.1093/COMJNL/40.2_AND_3.103

关键词:

摘要: This paper describes an algorithm, called SEQUITUR, that identi"es hierarchical structure in sequences of discrete symbols and uses information for compression. On many practical it performs well at both compression structural inference, producing comprehensible descriptions sequence the form grammar rules. The algorithm can be stated concisely two constraints on a context-free grammar. Inference is performed incrementally, faithfully representing input all times. It implemented ef"ciently operates time approximately linear length. Despite its simplicity ef"ciency, SEQUITUR succeeds inferring range interesting structures from naturally occurring sequences.

参考文章(17)
C.G. Nevill-Manning, I.H. Witten, Linear-time, incremental hierarchy inference for compression data compression conference. pp. 3- 11 ,(1997) , 10.1109/DCC.1997.581951
J. Gerard Wolfp, Language Acquisition and the Discovery of Phrase Structure. Language and Speech. ,vol. 23, pp. 255- 269 ,(1980) , 10.1177/002383098002300303
James A. Storer, Thomas G. Szymanski, Data compression via textual substitution Journal of the ACM. ,vol. 29, pp. 928- 951 ,(1982) , 10.1145/322344.322346
Doron Gottlieb, Steven A. Hagerth, Philippe G. H. Lehot, Henry S. Rabinowitz, A classification of compression methods and their usefulness for a large data processing center Proceedings of the May 19-22, 1975, national computer conference and exposition on - AFIPS '75. pp. 453- 458 ,(1975) , 10.1145/1499949.1500038
Alistair Moffat, Word-based text compression Software - Practice and Experience. ,vol. 19, pp. 185- 198 ,(1989) , 10.1002/SPE.4380190207
Alistair Moffat, Radford M. Neal, Ian H. Witten, Arithmetic coding revisited ACM Transactions on Information Systems. ,vol. 16, pp. 256- 294 ,(1998) , 10.1145/290159.290162
Donald L. Bisdorf, The 371 Chorales of Johann Sebastian Bach Music Educators Journal. ,vol. 53, pp. 113- 114 ,(1967) , 10.1177/002743216705300826
J. Ziv, A. Lempel, A universal algorithm for sequential data compression IEEE Transactions on Information Theory. ,vol. 23, pp. 337- 343 ,(1977) , 10.1109/TIT.1977.1055714
J. Ziv, A. Lempel, Compression of individual sequences via variable-rate coding IEEE Transactions on Information Theory. ,vol. 24, pp. 530- 536 ,(1978) , 10.1109/TIT.1978.1055934