Parallel crawlers

作者: Junghoo Cho , Hector Garcia-Molina

DOI: 10.1145/511446.511464

关键词:

摘要: … type of crawler as a parallel crawler. In this paper we study how we should design a parallel crawler, so … In particular, we believe the following issues make the study of a parallel crawler …

参考文章(19)
B. Pinkerton, Finding What People Want : Experiences with the WebCrawler Proc. of the Second International WWW Conference. ,(1994)
Soumen Chakrabarti, Martin van den Berg, Byron Dom, Focused crawling: a new approach to topic-specific Web resource discovery the web conference. ,vol. 31, pp. 1623- 1640 ,(1999) , 10.1016/S1389-1286(99)00052-3
Hector Garcia-Molina, Junghoo Cho, The Evolution of the Web and Implications for an Incremental Crawler very large data bases. pp. 200- 209 ,(2000)
Allan Heydon, Marc Najork, Mercator: A scalable, extensible Web crawler World Wide Web. ,vol. 2, pp. 219- 229 ,(1999) , 10.1023/A:1019213109274
Junghoo Cho, Hector Garcia-Molina, Synchronizing a database to improve freshness international conference on management of data. ,vol. 29, pp. 117- 128 ,(2000) , 10.1145/335191.335391
Albert-László Barabási, Réka Albert, Emergence of Scaling in Random Networks Science. ,vol. 286, pp. 509- 512 ,(1999) , 10.1126/SCIENCE.286.5439.509
D. Eichmann, The RBSE spider — Balancing effective search against Web load Computer Networks and ISDN Systems. ,vol. 27, pp. 308- ,(1994) , 10.1016/S0169-7552(94)90151-1
E. G. Coffman, Zhen Liu, Richard R. Weber, Optimal Robot Scheduling for Web Search Engines Journal of Scheduling. ,vol. 1, pp. 15- 29 ,(1998) , 10.1002/(SICI)1099-1425(199806)1:1<15::AID-JOS3>3.0.CO;2-K
J. M. Neefe, M. D. Dahlin, D. S. Roselli, D. A. Patterson, R. Y. Wang, T. E. Anderson, Serverless Network File Systems ,(1995)
Junghoo Cho, Hector Garcia-Molina, Lawrence Page, Efficient crawling through URL ordering the web conference. ,vol. 30, pp. 161- 172 ,(1998) , 10.1016/S0169-7552(98)00108-1