ONDUX

作者： Eli Cortez , Altigran S. da Silva , Marcos André Gonçalves , Edleno S. de Moura

关键词:

摘要: Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Extraction), a new unsupervised probabilistic approach for IETS. As other IETS approaches, relies on information pre-existing associate segments input string with attributes given domain. Unlike rely very effective matching strategies instead explicit learning strategies. The effectiveness strategy also exploited disambiguate certain through reinforcement step explores sequencing and positioning attribute directly learned on-demand from test data, no previous human-driven training, feature unique ONDUX. This assigns high degree flexibility results superior effectiveness, as demonstrated experimental evaluation report different domains, compared state-of-art approach.

参考文章(18)

Jalal Mahmud, I. V. Ramakrishnan, Chang Zhao, Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields. siam international conference on data mining. pp. 420- 431 ,(2008)

Kevin Chen-Chuan Chang, ChengXiang Zhai, Shui-Lung Chuang, Context-aware wrapping: synchronized data extraction very large data bases. pp. 699- 710 ,(2007)

Fuchun Peng, Andrew McCallum, Information extraction from research papers using conditional random fields Information Processing & Management. ,vol. 42, pp. 963- 979 ,(2006) , 10.1016/J.IPM.2005.09.002

Filipe Mesquita, Altigran S da Silva, Edleno S de Moura, Pavel Calado, Alberto HF Laender, None, LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces Information Processing & Management. ,vol. 43, pp. 983- 1004 ,(2007) , 10.1016/J.IPM.2006.09.018

Vinayak Borkar, Kaustubh Deshmukh, Sunita Sarawagi, Automatic segmentation of text into structured records international conference on management of data. ,vol. 30, pp. 175- 186 ,(2001) , 10.1145/375663.375682

Eli Cortez, Altigran S da Silva, Marcos André Gonçalves, Filipe Mesquita, Edleno S de Moura, None, A flexible approach for extracting metadata from bibliographic citations Journal of the Association for Information Science and Technology. ,vol. 60, pp. 1144- 1158 ,(2009) , 10.1002/ASI.V60:6

Eli Cortez, Altigran S da Silva, Marcos André Gonçalves, Filipe Mesquita, Edleno S de Moura, None, FLUX-CIM Proceedings of the 2007 conference on Digital libraries - JCDL '07. pp. 215- 224 ,(2007) , 10.1145/1255175.1255219

Thorsten Joachims, Transductive Inference for Text Classification using Support Vector Machines international conference on machine learning. pp. 200- 209 ,(1999)

L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: a survey Journal of Artificial Intelligence Research. ,vol. 4, pp. 237- 285 ,(1996) , 10.1613/JAIR.301

10.

I.R. Mansuri, S. Sarawagi, Integrating Unstructured Data into Relational Databases international conference on data engineering. pp. 29- 29 ,(2006) , 10.1109/ICDE.2006.83

ONDUX

来源期刊

我的账户

ONDUX

来源期刊

相似文章 10

我的账户