The smallest grammar problem

作者: M. Charikar , E. Lehman , D. Liu , R. Panigrahy , M. Prabhakaran

DOI: 10.1109/TIT.2005.850116

关键词:

摘要: This paper addresses the smallest grammar problem: What is context-free that generates exactly one given string /spl sigma/? a natural question about fundamental object connected to many fields such as data compression, Kolmogorov complexity, pattern identification, and addition chains. Due problem's inherent our objective find an approximation algorithm which finds small for input string. We focus attention on ratio of (and implicitly, worst case behavior) establish provable performance guarantees address shortcomings in classical measure redundancy literature. Our first results are concern hardness approximating problem. Most notably, we show every efficient problem has at least 8569/8568 unless P=NP. then bound ratios several best known grammar-based compression algorithms, including LZ78, B ISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, RE-PAIR. Among these, upper O(n/sup 1/2/). finish by presenting two novel algorithms with exponentially better O(log/sup 3/n) O(log(n/m/sup */)), where m/sup */ size input. The latter highlights connection between LZ77.

参考文章(38)
Philip Gage, A new algorithm for data compression The C Users Journal archive. ,vol. 12, pp. 23- 38 ,(1994)
A. Apostolico, S. Lonardi, Some theory and practice of greedy off-line textual substitution data compression conference. pp. 119- 128 ,(1998) , 10.1109/DCC.1998.672138
Piotr Berman, Marek Karpinski, On Some Tighter Inapproximability Results, Further Improvements Electronic Colloquium on Computational Complexity. ,vol. 5, ,(1998)
Paul Erdös, Remarks on number theory III. On addition chains Acta Arithmetica. ,vol. 6, pp. 77- 81 ,(1960) , 10.4064/AA-6-1-77-81
Carl G. De Marcken, Robert C. Berwick, Unsupervised language acquisition PhDT. ,(1996)
Piotr Berman, Marek Karpinski, On Some Tighter Inapproximability Results international colloquium on automata languages and programming. pp. 200- 209 ,(1998)
James A. Storer, Data compression: methods and theory Computer Science Press, Inc.. ,(1987)
T. Kida, Y. Shibata, M. Takeda, A. Shinohara, S. Arikawa, A unifying framework for compressed pattern matching string processing and information retrieval. pp. 89- 96 ,(1999) , 10.1109/SPIRE.1999.796582
A. Apostolico, S. Lonardi, Compression of biological sequences by greedy off-line textual substitution data compression conference. pp. 143- 152 ,(2000) , 10.1109/DCC.2000.838154
Edward G. Thurber, Efficient Generation of Minimal Length Addition Chains SIAM Journal on Computing. ,vol. 28, pp. 1247- 1263 ,(1999) , 10.1137/S0097539795295663