作者: M. Charikar , E. Lehman , D. Liu , R. Panigrahy , M. Prabhakaran
关键词:
摘要: This paper addresses the smallest grammar problem: What is context-free that generates exactly one given string /spl sigma/? a natural question about fundamental object connected to many fields such as data compression, Kolmogorov complexity, pattern identification, and addition chains. Due problem's inherent our objective find an approximation algorithm which finds small for input string. We focus attention on ratio of (and implicitly, worst case behavior) establish provable performance guarantees address shortcomings in classical measure redundancy literature. Our first results are concern hardness approximating problem. Most notably, we show every efficient problem has at least 8569/8568 unless P=NP. then bound ratios several best known grammar-based compression algorithms, including LZ78, B ISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, RE-PAIR. Among these, upper O(n/sup 1/2/). finish by presenting two novel algorithms with exponentially better O(log/sup 3/n) O(log(n/m/sup */)), where m/sup */ size input. The latter highlights connection between LZ77.