Automatic Classification of Web Databases Using Domain-Dictionaries

作者: Heidy M. Marin-Castro , Victor J. Sosa-Sosa , Ivan Lopez-Arevalo , Hugo Jair Escalante-Baldera

DOI: 10.1007/978-3-642-39712-7_26

关键词:

摘要: The identification, classification and integration of databases on the Web (also called web databases) as information sources is still a great challenge due to their constantly growing diversification. such according application domain an important step towards deep sources. Moreover, given design content heterogeneity that exists among different databases, automatic become highly demanded task, requiring techniques allow cluster domains they belong to. In this paper we present strategy for based new supervised approach. This uses visible available group specific-domain Query Interfaces (WQIs) construct dictionary or lexicon will better describe particular interest. enriched with synonyms. our experiments, was built from set randomly selected WQIs. WQI dictionaries generated in way showed efficient competitive results compared against related work.

参考文章(17)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Ling Lin, Lizhu Zhou, Web database schema identification through simple query interface Lecture Notes in Computer Science. ,vol. 6162, pp. 18- 34 ,(2009) , 10.1007/978-3-642-14415-8_2
Thomas Kabisch, Extraction and integration of Web query interfaces Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II. ,(2011) , 10.18452/16398
Lu Jiang, Zhaohui Wu, Qian Feng, Jun Liu, Qinghua Zheng, Efficient deep web crawling using reinforcement learning knowledge discovery and data mining. pp. 428- 439 ,(2010) , 10.1007/978-3-642-13657-3_46
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli, Distributional term representations: an experimental comparison conference on information and knowledge management. pp. 615- 624 ,(2004) , 10.1145/1031171.1031284
Yanbo Ru, Ellis Horowitz, Indexing the invisible web: a survey Online Information Review. ,vol. 29, pp. 249- 265 ,(2005) , 10.1108/14684520510607579
Ying Wang, Huilai Li, Wanli Zuo, Fengling He, Xin Wang, Kerui Chen, Research on discovering deep web entries Computer Science and Information Systems. ,vol. 8, pp. 779- 799 ,(2011) , 10.2298/CSIS100322028W
Yiyao Lu, Hai He, Qian Peng, Weiyi Meng, Clement Yu, Clustering e-commerce search engines based on their search interface pages using WISE-cluster data and knowledge engineering. ,vol. 59, pp. 231- 246 ,(2006) , 10.1016/J.DATAK.2006.01.010