RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.

作者： Junghoo Cho , Uri Schonfeld

DOI:

关键词:

摘要: Crawling algorithms have been the subject of extensive research and optimizations, but some important questions remain open. In particular, given infinite number pages available on Web, search-engine operators constantly struggle with following vexing questions: When can I stop downloading Web? How many should download to cover “most” know am not missing an part when stop? this paper we provide answer these by developing a family crawling that (1) theoretical guarantee how much “important” Web it will after certain (2) give high priority during crawl, so search engine index most first. We prove correctness our analysis evaluate their performance experimentally based 141 million URLs obtained from Web. Our experiments demonstrate even simple algorithm is effective in early provides “coverage” relatively small pages.

uni-trier.de 本地加速

vldb.org PDF 下载加速

uni-trier.de PDF 下载加速

psu.edu PDF 下载加速

参考文章(26)

David J. DeWitt, Yuan Wang, Computing PageRank in a Distributed Internet Search Engine System. very large data bases. pp. 420- 431 ,(2004)

Martin Ester, Hans-Peter Kriegel, Matthias Schubert, Accurate and efficient crawling for relevant websites very large data bases. pp. 396- 407 ,(2004) , 10.1016/B978-012088469-8.50037-1

Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8

Marc Najork, Janet L. Wiener, Breadth-first crawling yields high-quality pages Proceedings of the tenth international conference on World Wide Web - WWW '01. pp. 114- 118 ,(2001) , 10.1145/371920.371965

Amy N Langville, Carl D Meyer, Deeper Inside PageRank Internet Mathematics. ,vol. 1, pp. 335- 380 ,(2004) , 10.1080/15427951.2004.10129091

Soumen Chakrabarti, Kunal Punera, Mallela Subramanyam, None, Accelerated focused crawling through online relevance feedback the web conference. pp. 148- 159 ,(2002) , 10.1145/511446.511466

Junghoo Cho, Hector Garcia-Molina, Lawrence Page, Efficient crawling through URL ordering the web conference. ,vol. 30, pp. 161- 172 ,(1998) , 10.1016/S0169-7552(98)00108-1

J. L. Wolf, M. S. Squillante, P. S. Yu, J. Sethuraman, L. Ozsen, Optimal crawling strategies for web search engines the web conference. pp. 136- 147 ,(2002) , 10.1145/511446.511465

Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X

10.

Glen Jeh, Jennifer Widom, Scaling personalized web search Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 271- 279 ,(2003) , 10.1145/775152.775191

RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.

来源期刊

我的账户

RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.

来源期刊

相似文章 10

我的账户