作者: Hector Garcia-Molina , Junghoo Cho
DOI:
关键词:
摘要: In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead periodically refreshing the in batch mode. can improve ``freshness'' significantly bring new pages a more timely manner. We first present results from experiment conducted on than half million over 4 months, estimate evolve time. Based these experimental results, compare various design choices for discuss their trade-offs. propose architecture crawler, which combines best choices.