Data extraction and label assignment for web databases

作者: Jiying Wang , Fred H. Lochovsky

DOI: 10.1145/775152.775179

关键词:

摘要: … wrappers to extract data objects from the result pages and restoring the retrieved data into an … 90% correctness for data extraction and around 80% correctness for label assignment). …

参考文章(20)
Frederick H. Lochovsky, Jiying Wang, Wrapper induction based on nested pattern discovery ,(2002)
Hector Garcia-Molina, Sriram Raghavan, Integrating Diverse Information Management Systems: A Brief Survey IEEE Data(base) Engineering Bulletin. ,vol. 24, pp. 44- 52 ,(2001)
L. Liu, C. Pu, W. Han, XWRAP: an XML-enabled wrapper construction system for Web information sources international conference on data engineering. pp. 611- 621 ,(2000) , 10.1109/ICDE.2000.839475
Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva, Extracting semi-structured data through examples conference on information and knowledge management. pp. 94- 101 ,(1999) , 10.1145/319950.319962
Chun-Nan Hsu, Ming-Tzung Dung, Generating finite-state transducers for semi-structured data extraction from the Web Information Systems. ,vol. 23, pp. 521- 538 ,(1998) , 10.1016/S0306-4379(98)00027-1
Daniela Florescu, Alon Levy, Alberto Mendelzon, Database techniques for the World-Wide Web: a survey international conference on management of data. ,vol. 27, pp. 59- 74 ,(1998) , 10.1145/290593.290605
Paolo Merialdo, Valter Crescenzi, Giansalvatore Mecca, RoadRunner: Towards Automatic Data Extraction from Large Web Sites very large data bases. pp. 109- 118 ,(2001)
Jiying Wang, F.H. Lochovsky, Data-rich section extraction from HTML pages web information systems engineering. pp. 313- 322 ,(2002) , 10.1109/WISE.2002.1181667
D. Buttler, Ling Liu, C. Pu, A fully automated object extraction system for the World Wide Web international conference on distributed computing systems. pp. 361- 370 ,(2001) , 10.1109/ICDSC.2001.918966