Domain-oriented Deep Web Data Sources' Discovery and Identification

作者: Yingjun Li , Tiezheng Nie , Derong Shen , Ge Yu

DOI: 10.1109/APWEB.2010.54

关键词: Social Semantic WebData miningInformation retrievalWeb modelingWeb intelligenceComputer scienceWeb search queryData WebSemantic Web StackWeb query classificationSemantic similarity

摘要: As Deep Web contains tremendous well-structured data sources, how to integrate sources in has become a hotspot current research. Accurately discovering and identifying related specific domain key issues. We propose Domain-Oriented source Discovery method (DO-DWD) novel Domain Identification strategy of (DIDW). In the discovery stage, we use machine learning algorithms some heuristic rules find query interfaces sources; identification identify associated with by calculating relevance between interface based on semantic similarity. Finally, have extensive experiments real set showing that DO-DWD DIDW are high correctness accuracy.

参考文章(5)
Robert B. Doorenbos, Oren Etzioni, Daniel S. Weld, A scalable comparison-shopping agent for the World-Wide Web adaptive agents and multi-agents systems. pp. 39- 48 ,(1997) , 10.1145/267658.267666
M. K. Bergman, The deep web : Surfacing hidden value J. Electronic Publishing, the University of Michigan. ,(2001)
A. Bergholz, B. Childlovskii, Crawling for domain-specific hidden Web resources web information systems engineering. pp. 125- 133 ,(2003) , 10.1109/WISE.2003.1254476
Y. Hedley, M. Younas, A. James, The categorisation of hidden Web databases through concept specificity and coverage advanced information networking and applications. ,vol. 2, pp. 671- 676 ,(2005) , 10.1109/AINA.2005.323
Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami, Probe, count, and classify: categorizing hidden web databases international conference on management of data. ,vol. 30, pp. 67- 78 ,(2001) , 10.1145/375663.375671