Towards Deeper Understanding of the Search Interfaces of the Deep Web

作者: Hai He , Weiyi Meng , Yiyao Lu , Clement Yu , Zonghuan Wu

DOI: 10.1007/S11280-006-0010-9

关键词:

摘要: Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries access the underlying databases. In general, such a Web interface can be considered as containing an schema with multiple attributes rich semantic/meta-information; however, is not formally defined in HTML. applications, database integration deep crawling, require construction of schemas. this paper, we first propose model for representing interfaces, then present layout-expression based approach automatically extract logical from interfaces. We also rephrase identification different types semantic information classification problem, design several Bayesian classifiers help derive extracted attributes. A system, WISE-iExtractor, has been implemented construct any Our experimental results on real indicate system highly effective.

参考文章(20)
A. Gal, G. Modica, H. Jamil, OntoBuilder: fully automatic extraction and consolidation of ontologies from Web sources international conference on data engineering. pp. 853- ,(2004) , 10.1109/ICDE.2004.1320082
Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu, Wise-integrator: an automatic integrator of web search interfaces for E-commerce very large data bases. pp. 357- 368 ,(2003) , 10.1016/B978-012722442-8/50039-2
Ron Kohavi, Dan Sommerfield, Barry G. Becker, Improving simple Bayes ECML. ,(1997)
Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu, Constructing interface schemas for search interfaces of web databases web information systems engineering. pp. 29- 42 ,(2005) , 10.1007/11581062_3
Joann J. Ordille, Anand Rajaraman, Alon Y. Levy, Querying Heterogeneous Information Sources Using Source Descriptions very large data bases. pp. 251- 262 ,(1996)
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang, Structured databases on the web: observations and implications international conference on management of data. ,vol. 33, pp. 61- 70 ,(2004) , 10.1145/1031570.1031584
Yiyao Lu, Hai He, Qian Peng, Weiyi Meng, Clement Yu, Clustering e-commerce search engines based on their search interface pages using WISE-cluster data and knowledge engineering. ,vol. 59, pp. 231- 246 ,(2006) , 10.1016/J.DATAK.2006.01.010
Jiying Wang, Fred H. Lochovsky, Data extraction and label assignment for web databases Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 187- 196 ,(2003) , 10.1145/775152.775179
Sonia Bergamaschi, Silvana Castano, Maurizio Vincini, Domenico Beneventano, Semantic integration of heterogeneous information sources data and knowledge engineering. ,vol. 36, pp. 215- 249 ,(2001) , 10.1016/S0169-023X(00)00047-1