Indexing the invisible web: a survey

作者: Yanbo Ru , Ellis Horowitz

DOI: 10.1108/14684520510607579

关键词:

摘要: Purpose – The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all material on into form is easily retrieved by users. purpose this paper identify challenges problems underlying existing work in area.Design/methodology/approach A discussion based short survey prior work, including automated discovery site interfaces, classification sites, label assignment filling, information extraction from resulting pages, learning query language interface, building content summary an site, selecting proper databases, integrating web‐search accessing performance site.Findings Existing technologies tools indexing follow one two strategies: interface or examining portion con...

参考文章(38)
Nigel Hamilton, The Mechanics of a Deep Net Metasearch Engine. WWW (Posters). ,(2003)
Panagiotis G. Ipeirotis, Mehran Sahami, Luis Gravano, Query- vs. Crawling-based Classification of Searchable Web Databases. IEEE Data(base) Engineering Bulletin. ,vol. 25, pp. 43- 50 ,(2002)
Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu, Wise-integrator: an automatic integrator of web search interfaces for E-commerce very large data bases. pp. 357- 368 ,(2003) , 10.1016/B978-012722442-8/50039-2
Budi Yuwono, Dik L Lee, None, Server Ranking for Distributed Text Retrieval Systems on the Internet database systems for advanced applications. pp. 41- 50 ,(1997) , 10.1142/9789812819536_0005
Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang, Structured databases on the web: observations and implications international conference on management of data. ,vol. 33, pp. 61- 70 ,(2004) , 10.1145/1031570.1031584
Ying Zhao, George Karypis, Evaluation of hierarchical clustering algorithms for document datasets conference on information and knowledge management. pp. 515- 524 ,(2002) , 10.1145/584792.584877
Juliano Palmieri Lage, Altigran S. da Silva, Paulo B. Golgher, Alberto H.F. Laender, Automatic generation of agents for collecting hidden web pages for data extraction data and knowledge engineering. ,vol. 49, pp. 177- 196 ,(2004) , 10.1016/J.DATAK.2003.10.003
Walter L. Warnick, R. L. Scott, Karen J. Spence, Lorrie A. Johnson, Valerie S. Allen, Abe Lederman, Searching the Deep Web: Directed Query Engine Applications at the Department of Energy D-lib Magazine. ,vol. 7, ,(2001) , 10.1045/JANUARY2001-WARNICK