作者: C. Stanfill , R. Thau , D. Waltz
DOI: 10.1145/75334.75345
关键词:
摘要: In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident primary or secondary storage. The is based inverted indexes, and has two advantages over previously published retrieval signature files. First, it permits the employment strategies which cannot be easily implemented using files, specifically methods depend document-term weighting. Second, interactive searching evaluated via mixture analytic simulation techniques, with particular focus how cost-effectiveness efficiency change as size database, number processors, cost memory are altered. particular, find that if ratio processors and/or disks to database held constant, then resulting system remains constant. Furthermore, given there optimizes cost-effectiveness. Estimated response times also presented. Using these methods, appears cost-effective access in 100-1000 GB range can achieved current technology.