作者: Tim Furche , Georg Gottlob , Giovanni Grasso , Xiaonan Guo , Giorgio Orsi
DOI: 10.1007/S00778-013-0323-0
关键词:
摘要: Forms are our gates to the Web. They enable us access deep content of Web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines service integrators, with a key this content. Yet, it has received little attention other than as component in specific applications such or engines. No comprehensive approach exists, let alone one that produces rich models for semantic services integration linked open data. In paper, we present opal, first and integration. We identify labeling interpretation two main tasks involved understanding. On both problems, opal advances state art: For labeling, combines features text, structure, visual rendering page. extensive experiments on ICQ TEL-8 benchmarks set 200 modern forms, outperforms previous approaches by significant margin. interpretation, uses schema (or ontology) forms given domain. Thanks domain schema, is able produce nearly perfect ( $$>$$ > 97 % accuracy evaluation domains) interpretations. effort very low, provide datalog-based template language eases specification schemata methodology deriving largely automatically an existing ontology. demonstrate value opal's interpretations through light-weight system successfully translates distributes master queries hundreds no error, yet implemented only handful translation rules.