Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

作者: Pierre Senellart , Marilena Oita , Antoine Amarilli

DOI:

关键词:

摘要: Deep Web databases, whose content is presented as dynamically- generated pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence domain knowledge, which costly create and maintain. article, we present a new perspective on form understanding deep data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, do perform various steps in process (e.g., under- standing, record identification, attribute labeling) independently but integrate them achieve more complete sources. Through information extraction using itself for validation, reconcile input output schemas labeled graph further aligned with generic ontology. The impact alignment threefold: first, resulting seman- tic infrastructure associated can assist crawlers when probing indexing; second, attributes response are matching known ontology instances, relations between uncovered; third, enrich facts from Web.

参考文章(27)
Brigitte Safar, Chantal Reynaud, Exploiting wordnet as background knowledge international conference on ontology matching. pp. 291- 295 ,(2007)
Elena Beisswanger, Exploiting relation extraction for ontology alignment international semantic web conference. pp. 289- 296 ,(2010) , 10.1007/978-3-642-17749-1_19
Nathalie Pernelle, Nacéra Bennacer, Mouhamadou Thiam, Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents european semantic web conference. ,vol. 346, pp. 18- 30 ,(2008)
Jiying Wang, Ji-Rong Wen, Fred Lochovsky, Wei-Ying Ma, Instance-based schema matching for web databases by domain-specific query probing very large data bases. pp. 408- 419 ,(2004) , 10.1016/B978-012088469-8.50038-3
Wensheng Wu, AnHai Doan, Clement Yu, Weiyi Meng, Bootstrapping domain ontology for semantic web services from source web sites Lecture Notes in Computer Science. pp. 11- 22 ,(2005) , 10.1007/11607380_2
Fabian M. Suchanek, Serge Abiteboul, Pierre Senellart, PARIS Proceedings of the VLDB Endowment. ,vol. 5, pp. 157- 168 ,(2011) , 10.14778/2078331.2078332
Yoo Jung An, Soon Ae Chun, Kuo-chuan Huang, James Geller, Enriching Ontology for Deep Web Search database and expert systems applications. pp. 73- 80 ,(2008) , 10.1007/978-3-540-85654-2_9
Manuel Álvarez, Alberto Pan, Juan Raposo, Fernando Bellas, Fidel Cacheda, Extracting lists of data records from semi-structured web pages data and knowledge engineering. ,vol. 64, pp. 491- 509 ,(2008) , 10.1016/J.DATAK.2007.10.002
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum, Yago: a core of semantic knowledge the web conference. pp. 697- 706 ,(2007) , 10.1145/1242572.1242667
Tim Furche, Giovanni Grasso, Giorgio Orsi, Christian Schallhart, Cheng Wang, Automatically learning gazetteers from the deep web the web conference. pp. 341- 344 ,(2012) , 10.1145/2187980.2188044