SmartCrawl: a new strategy for the exploration of the hidden web

作者： Augusto de Carvalho Fontes , F�bio Soares Silva

关键词: Hyperlink 、 Web search query 、 Web intelligence 、 Web page 、 Information retrieval 、 World Wide Web 、 Site map 、 Web navigation 、 Computer science 、 Web search engine 、 Web crawler

摘要: The way current search engines work leaves a large amount of information available in the World Wide Web outside their catalogues. This is due to fact that crawlers by following hyperlinks and few other references ignore HTML forms. In this paper, we propose engine prototype can retrieve behind forms automatically generating queries for them. We describe architecture, some implementation details an experiment proves not indexed engines.

参考文章(8)

Chris Sherman, Gary E. Price, The Invisible Web: Uncovering Information Sources Search Engines Can't See ,(2001)

Juliano Palmieri Lage, Altigran S. da Silva, Paulo B. Golgher, Alberto H. F. Laender, Collecting hidden weeb pages for data extraction Proceedings of the fourth international workshop on Web information and data management - WIDM '02. pp. 69- 75 ,(2002) , 10.1145/584931.584946

Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X

M. K. Bergman, The deep web : Surfacing hidden value J. Electronic Publishing, the University of Michigan. ,(2001)

King-Ip Lin, Hui Chen, Automatic information discovery from the "invisible Web" international conference on information technology coding and computing. pp. 332- 337 ,(2002) , 10.1109/ITCC.2002.1000411

Stephen W. Liddle, David W. Embley, Del T. Scott, Sai Ho Yau, Extracting Data behind Web Forms Lecture Notes in Computer Science. pp. 402- 413 ,(2003) , 10.1007/978-3-540-45275-1_35

V. Shkapenyuk, T. Suel, Design and implementation of a high-performance distributed Web crawler international conference on data engineering. pp. 357- 368 ,(2002) , 10.1109/ICDE.2002.994750

Hector Garcia-Molina, Sriram Raghavan, Crawling the Hidden Web very large data bases. pp. 129- 138 ,(2001)

SmartCrawl: a new strategy for the exploration of the hidden web

来源期刊

我的账户

SmartCrawl: a new strategy for the exploration of the hidden web

来源期刊

相似文章 10

我的账户