The ontological key: automatically understanding and integrating forms to access the deep Web

作者: Tim Furche , Georg Gottlob , Giovanni Grasso , Xiaonan Guo , Giorgio Orsi

DOI: 10.1007/S00778-013-0323-0

关键词:

摘要: Forms are our gates to the Web. They enable us access deep content of Web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines service integrators, with a key this content. Yet, it has received little attention other than as component in specific applications such or engines. No comprehensive approach exists, let alone one that produces rich models for semantic services integration linked open data. In paper, we present opal, first and integration. We identify labeling interpretation two main tasks involved understanding. On both problems, opal advances state art: For labeling, combines features text, structure, visual rendering page. extensive experiments on ICQ TEL-8 benchmarks set 200 modern forms, outperforms previous approaches by significant margin. interpretation, uses schema (or ontology) forms given domain. Thanks domain schema, is able produce nearly perfect ( $$>$$ > 97 % accuracy evaluation domains) interpretations. effort very low, provide datalog-based template language eases specification schemata methodology deriving largely automatically an existing ontology. demonstrate value opal's interpretations through light-weight system successfully translates distributes master queries hundreds no error, yet implemented only handful translation rules.

参考文章(38)
Feng Niu, Ce Zhang, Christopher Ré, Jude W Shavlik, None, DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference VLDS. pp. 25- 28 ,(2012)
Isabel Navarrete, Antonio Morales, Guido Sciavicco, M. Antonia Cardenas-Viedma, Spatial reasoning with rectangular cardinal relations Annals of Mathematics and Artificial Intelligence. ,vol. 67, pp. 31- 70 ,(2013) , 10.1007/S10472-012-9327-5
Wensheng Wu, AnHai Doan, Clement Yu, Weiyi Meng, Modeling and Extracting Deep-Web Query Interfaces Advances in Information and Intelligent Systems. pp. 65- 90 ,(2009) , 10.1007/978-3-642-04141-9_4
Jiying Wang, Ji-Rong Wen, Fred Lochovsky, Wei-Ying Ma, Instance-based schema matching for web databases by domain-specific query probing very large data bases. pp. 408- 419 ,(2004) , 10.1016/B978-012088469-8.50038-3
Samur Araujo, Qi Gao, Erwin Leonardi, Geert-Jan Houben, Carbon: Domain-Independent Automatic Web Form Filling Lecture Notes in Computer Science. pp. 292- 306 ,(2010) , 10.1007/978-3-642-13911-6_20
Clement T. Yu, Weiyi Meng, Eduard C. Dragut, Deep Web Query Interface Understanding and Integration ,(2012)
Ziv Bar-Yossef, Maxim Gurevich, Random sampling from a search engine's index Journal of the ACM. ,vol. 55, pp. 1- 74 ,(2008) , 10.1145/1411509.1411514
Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart, OPAL: automated form understanding for the deep web the web conference. pp. 829- 838 ,(2012) , 10.1145/2187836.2187948
Kevin Chen-Chuan Chang, Bin He, Zhen Zhang, Mining semantics for large scale integration on the web: evidences, insights, and challenges Sigkdd Explorations. ,vol. 6, pp. 67- 76 ,(2004) , 10.1145/1046456.1046465
Hai He, Weiyi Meng, Yiyao Lu, Clement Yu, Zonghuan Wu, Towards Deeper Understanding of the Search Interfaces of the Deep Web World Wide Web. ,vol. 10, pp. 133- 155 ,(2007) , 10.1007/S11280-006-0010-9