Research on discovering deep web entries

作者: Ying Wang , Huilai Li , Wanli Zuo , Fengling He , Xin Wang

DOI: 10.2298/CSIS100322028W

关键词:

摘要: Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently Web databases based on focused crawling and ontology by constructing Page Classifier(WPC), Form Structure Classifier(FSC) Form Content Classifier(FCC) hierarchical fashion. Firstly, WPC discovers potentially interesting pages on ontology-assisted focused crawler. Then, FSC analyzes the pages determines whether these subsume searchable forms structural characteristics. Lastly, FCC identifies that belong to given domain semantic level, stores URLs of Domain- Specific database. Through detailed experimental evaluation, not only simplifies discovering process, but also effectively databases.

参考文章(8)
Soumen Chakrabarti, Martin van den Berg, Byron Dom, Focused crawling: a new approach to topic-specific Web resource discovery the web conference. ,vol. 31, pp. 1623- 1640 ,(1999) , 10.1016/S1389-1286(99)00052-3
Elena Simperl, Reusing ontologies on the Semantic Web: A feasibility study data and knowledge engineering. ,vol. 68, pp. 905- 925 ,(2009) , 10.1016/J.DATAK.2009.02.002
Yanbo Ru, Ellis Horowitz, Indexing the invisible web: a survey Online Information Review. ,vol. 29, pp. 249- 265 ,(2005) , 10.1108/14684520510607579
Adela Lau, Eric Tsui, W.B. Lee, An ontology-based similarity measurement for problem-based case reasoning Expert Systems With Applications. ,vol. 36, pp. 6574- 6579 ,(2009) , 10.1016/J.ESWA.2008.07.033
Luis Gravano, Panagiotis G. Ipeirotis, Mehran Sahami, QProber: A system for automatic classification of hidden-Web databases ACM Transactions on Information Systems. ,vol. 21, pp. 1- 41 ,(2003) , 10.1145/635484.635485
J. Ross Quinlan, C4.5: Programs for Machine Learning ,(1992)
Juliana Freire, Luciano Barbosa, Searching for Hidden-Web Databases international workshop on the web and databases. pp. 1- 6 ,(2005)