Providing Robust Access to Data in Web Pages

作者: Jerome Robinson

DOI:

关键词:

摘要: Much useful e-commerce information is available on web pages, especially those created by queries to servers. The problem for programs use that how ‘screen-scrape’ the data off page into machineusable structures. Wrappers sources knowledge of layout in order extract accurately. So they fail if format changes. This paper describes a fast method wrapper production and also automatically detect change, before it causes access fail. works pages contain collections items, such as lists, tables hierarchical It uses representation html documents, which makes repetitive features apparent. provides fully automatic class rapid interactive others.

参考文章(19)
David W. Embley, Stephen W. Liddle, Kimball A. Hewett, An integrated ontology development environment for data extraction ISTA. pp. 21- 33 ,(2003)
Jaeyoung Yang, Heekuck Oh, Kyung-Goo Doh, Joongmin Choi, A Knowledge-Based Information Extraction System for Semi-structured Labeled Documents intelligent data engineering and automated learning. pp. 105- 110 ,(2002) , 10.1007/3-540-45675-9_18
Mattis Neiling, Markus Schaal, Martin Schumann, WrapIt: Automated Integration of Web Databases with Extensional Overlaps Web, Web-Services, and Database Systems. pp. 184- 198 ,(2003) , 10.1007/3-540-36560-5_14
Ajay Hemnani, Stephane Bressan, Information Extraction - Tree Alignment Approach to Pattern Discovery in Web Documents database and expert systems applications. pp. 789- 798 ,(2002) , 10.1007/3-540-46146-9_78
Ion Muslea, Steven Minton, Craig A. Knoblock, Hierarchical Wrapper Induction for Semistructured Information Sources Autonomous Agents and Multi-Agent Systems. ,vol. 4, pp. 93- 114 ,(2001) , 10.1023/A:1010022931168
Chia-Hui Chang, Shao-Chen Lui, Yen-Chin Wu, Applying Pattern Mining to Web Information Extraction pacific asia conference on knowledge discovery and data mining. pp. 4- 16 ,(2001) , 10.1007/3-540-45357-1_4
Nicholas Kushmerick, Bernd Thomas, Adaptive Information Extraction: Core Technologies for Information Agents Intelligent Information Agents. pp. 79- 103 ,(2003) , 10.1007/3-540-36561-3_4
Raymond Kosala, Hendrik Blockeel, Web mining research: a survey Sigkdd Explorations. ,vol. 2, pp. 1- 15 ,(2000) , 10.1145/360402.360406
William W. Cohen, Wei Fan, Learning page-independent heuristics for extracting data from Web pages the web conference. ,vol. 31, pp. 1641- 1652 ,(1999) , 10.1016/S1389-1286(99)00047-X