作者: Eli Cortez , Altigran S. da Silva , Marcos André Gonçalves , Edleno S. de Moura
关键词:
摘要: Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Extraction), a new unsupervised probabilistic approach for IETS. As other IETS approaches, relies on information pre-existing associate segments input string with attributes given domain. Unlike rely very effective matching strategies instead explicit learning strategies. The effectiveness strategy also exploited disambiguate certain through reinforcement step explores sequencing and positioning attribute directly learned on-demand from test data, no previous human-driven training, feature unique ONDUX. This assigns high degree flexibility results superior effectiveness, as demonstrated experimental evaluation report different domains, compared state-of-art approach.