Compressing Inverted Files

作者: Andrew Trotman

DOI: 10.1023/A:1022949613039

关键词:

摘要: Research into inverted file compression has focused on ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It taken as read, smaller index, faster search. The premise “smaller better” may not be true. To truly build it often necessary to forfeit compression. For lists consisting of only 128 occurrences add overhead. Perhaps list could stored in bytes place words, but must still disk. If minimum disk sector read size 512 and word 4 bytes, then both compressed raw postings would require one seek read. A less efficient technique increase size, decrease load/decompress time, thereby increasing throughput. Examined here are five techniques, Golomb, Elias gamma, delta, Variable Byte Encoding Binary Interpolative Coding. The effect time all measured decompression time. quantitative measure throughput developed performance each method determined.

参考文章(24)
Tzi-cker Chiueh, Srinidhi Varadarajan, SASE: implementation of a compressed text search engine usenix symposium on internet technologies and systems. pp. 23- 23 ,(1997)
S. W. Golomb, Run-length encodings. ,(1966)
A. Moffat, L. Stuiver, Exploiting clustering in inverted file compression data compression conference. pp. 82- 91 ,(1996) , 10.1109/DCC.1996.488313
Gonzalo Navarro, Edleno Silva de Moura, Marden Neubert, Nivio Ziviani, Ricardo Baeza-Yates, Adding Compression to Block Addressing Inverted Indexes Information Retrieval. ,vol. 3, pp. 49- 77 ,(2000) , 10.1023/A:1009934302807
Alistair Moffat, Lang Stuiver, Binary Interpolative Coding for Effective Index Compression Information Retrieval. ,vol. 3, pp. 25- 47 ,(2000) , 10.1023/A:1013002601898
Gennady Antoshenkov, Byte aligned data compression ,(1993)
Theodore Johnson, Performance Measurements of Compressed Bitmap Indices very large data bases. pp. 278- 289 ,(1999)
A Bookstein, St Klein, T Raita, Simple Bayesian Model for Bitmap Compression Information Retrieval. ,vol. 1, pp. 315- 328 ,(2000) , 10.1023/A:1009931317394
A. Bookstein, S.T. Klein, T. Raita, Markov models for clusters in concordance compression Proceedings of IEEE Data Compression Conference (DCC'94). pp. 116- 125 ,(1994) , 10.1109/DCC.1994.305919
Anh Ngoc Vo, Alistair Moffat, Compressed inverted files with reduced decoding overheads international acm sigir conference on research and development in information retrieval. pp. 290- 297 ,(1998) , 10.1145/290941.291011