In situ generation of compressed inverted files

作者: Alistair Moffat , Timothy AH Bell , None

DOI: 10.1002/(SICI)1097-4571(199508)46:7<537::AID-ASI7>3.0.CO;2-P

关键词:

摘要: An inverted index stores, for each term that appears in a collection of documents, list document numbers containing term. Such an is indispensable when Boolean or informal ranked queries are to be answered. Construction the is, however, nontrivial task. Simple methods using in-memory data structures cannot used large collections because they require too much random access storage, and traditional disk-based amounts temporary file space. This paper describes new indexing algorithm designed create compressed indexes situ. It makes use simple compression codes positive integers in-place external multi-way mergesort. The technique has been invert two-gigabyte text under 4 hours, less than 40 megabytes disk space, 20 main memory. © 1995 John Wiley & Sons, Inc.

参考文章(0)