Identification of Deep Web Entries by Using Neural Network

作者: Chunming Wu , Xianchun Zou , Baohua Qiang

DOI: 10.1007/978-1-4471-2386-6_72

关键词:

摘要: Deep web is the fastest-growing new resource on Internet. The establishment of its data integration system has become a research focus. deep entries, with automatic identification as basis integration, usually appears in HTML forms. Owing to subjectivity form design, lack unified construction standards makes it difficult judge whether or not entry by heuristics and manually specified rules. Based global schema notion machine learning, this paper proposes an approach identify entries using neural network. Through statistic abundant forms data, provides 14 features distinguish query interface from non-query interface. Experiments 12 sets show higher accuracy our proposed use thus recommended.

参考文章(9)
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Juliano Palmieri Lage, Altigran S. da Silva, Paulo B. Golgher, Alberto H.F. Laender, Automatic generation of agents for collecting hidden web pages for data extraction data and knowledge engineering. ,vol. 49, pp. 177- 196 ,(2004) , 10.1016/J.DATAK.2003.10.003
Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma, Query Selection Techniques for Efficient Crawling of Structured Web Sources international conference on data engineering. pp. 47- 47 ,(2006) , 10.1109/ICDE.2006.124
M. K. Bergman, The deep web : Surfacing hidden value J. Electronic Publishing, the University of Michigan. ,(2001)
A. Bergholz, B. Childlovskii, Crawling for domain-specific hidden Web resources web information systems engineering. pp. 125- 133 ,(2003) , 10.1109/WISE.2003.1254476
David Hawking, Nick Craswell, Jared Cope, Automated discovery of search interfaces on the web australasian database conference. ,vol. 17, pp. 181- 189 ,(2003)
Luciano Barbosa, Juliana Freire, Combining classifiers to identify online databases the web conference. pp. 431- 440 ,(2007) , 10.1145/1242572.1242631
Hector Garcia-Molina, Sriram Raghavan, Crawling the Hidden Web very large data bases. pp. 129- 138 ,(2001)
Wang Hui, Liu Yan-Wei, Zuo Wan-Li, Using Classifiers to Find Domain-Specific Online Databases Automatically Journal of Software. ,(2008)