作者: C Mic Bowman , Peter B Danzig , Darren R Hardy , Udi Manber , Michael F Schwartz
DOI: 10.21236/ADA461844
关键词:
摘要: Abstract : Rapid growth in data volume user base and diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides set of customizable tools for gathering from diverse repositories, building topic-specific content indexes, flexibly searching the widely replicating them, caching objects as they are retrieved across Internet. The interoperates with Mosaic HTTP, FTP, Gopher resources. We discuss design implementation each subsystem provide measurements indicating Harvest can reduce server load, network traffic index space requirements significantly compared previous indexing systems. also half dozen indexes have built using underscoring both customizability scalability system.