作者: Eno Thereska , Dushyanth Narayanan , Anastassia Ailamaki
DOI:
关键词:
摘要: The AVATAR Information Extraction System (IES) at the IBM Almaden Research Center enables highprecision, rule-based, information extraction from text-documents. Drawing our experience we propose use of probabilistic database techniques as formal underpinnings systems so to maintain high precision while increasing recall. This involves building a framework where rule-based annotators can be mapped queries in system. We examples IES describe challenges achieving this goal. Finally, show that deriving estimates such system presents significant challenge for systems.