Research on Automate Discovery of Deep Web Interfaces

作者: Feiyue Ye , Hang Yu

DOI: 10.1007/978-3-319-26187-4_14

关键词:

摘要: The main means to obtain information from Deep Web is submitting query condition through the provided interfaces, so it first problem that needs be solved for data integration system. At present, most researchers think of interface merely defined within form html tag. This paper firstly proposes concept block, then designs block location method based on page and vision information, finally takes judgment whether a or not as special multi-class classification problems by applying algorithm combining C4.5 decision tree SVM. experiment adopts TEL-8 sets UIUC, findings indicate in this get an accuracy 97.30%, has good feasibility practicability.

参考文章(16)
Denis Shestakov, On building a search interface discovery system Lecture Notes in Computer Science. ,vol. 6162, pp. 81- 93 ,(2009) , 10.1007/978-3-642-14415-8_6
Lu Jiang, Zhaohui Wu, Qian Feng, Jun Liu, Qinghua Zheng, Efficient deep web crawling using reinforcement learning knowledge discovery and data mining. pp. 428- 439 ,(2010) , 10.1007/978-3-642-13657-3_46
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Clement Yu, Annotating Search Results from Web Databases IEEE Transactions on Knowledge and Data Engineering. ,vol. 25, pp. 514- 527 ,(2013) , 10.1109/TKDE.2011.175
Ying Wang, Huilai Li, Wanli Zuo, Fengling He, Xin Wang, Kerui Chen, Research on discovering deep web entries Computer Science and Information Systems. ,vol. 8, pp. 779- 799 ,(2011) , 10.2298/CSIS100322028W
Heidy M. Marin-Castro, Victor J. Sosa-Sosa, Jose F. Martinez-Trinidad, Ivan Lopez-Arevalo, Automatic discovery of Web Query Interfaces using machine learning techniques intelligent information systems. ,vol. 40, pp. 85- 108 ,(2013) , 10.1007/S10844-012-0217-4
Michael K. Bergman, White Paper: The Deep Web: Surfacing Hidden Value Journal of Electronic Publishing. ,vol. 7, ,(2001) , 10.3998/3336451.0007.104
Luis Gravano, Panagiotis G. Ipeirotis, Mehran Sahami, QProber: A system for automatic classification of hidden-Web databases ACM Transactions on Information Systems. ,vol. 21, pp. 1- 41 ,(2003) , 10.1145/635484.635485
Lifeng Zhou, Hong Wang, Qingsong Xu, Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach Information-an International Interdisciplinary Journal. ,vol. 5, pp. 634- 651 ,(2014) , 10.3390/INFO5040634
Luciano Barbosa, Juliana Freire, Altigran Silva, Organizing Hidden-Web Databases by Clustering Visible Web Documents international conference on data engineering. pp. 326- 335 ,(2007) , 10.1109/ICDE.2007.367878