作者: Pierre Senellart , Marilena Oita , Antoine Amarilli
DOI:
关键词:
摘要: Deep Web databases, whose content is presented as dynamically- generated pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence domain knowledge, which costly create and maintain. article, we present a new perspective on form understanding deep data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, do perform various steps in process (e.g., under- standing, record identification, attribute labeling) independently but integrate them achieve more complete sources. Through information extraction using itself for validation, reconcile input output schemas labeled graph further aligned with generic ontology. The impact alignment threefold: first, resulting seman- tic infrastructure associated can assist crawlers when probing indexing; second, attributes response are matching known ontology instances, relations between uncovered; third, enrich facts from Web.