作者: Isabelle Augenstein , Diana Maynard , Fabio Ciravegna
DOI: 10.1007/978-3-319-13704-9_3
关键词:
摘要: Extracting information from Web pages requires the ability to work at scale in terms of number documents, domains and domain complexity. Recent approaches have used existing knowledge bases learn extract with promising results. In this paper we propose use distant supervision for relation extraction Web. Distant is a method which uses background Linking Open Data cloud automatically label sentences relations create training data classifiers. Although promising, are still not suitable as they suffer three main issues: sparsity, noise lexical ambiguity. Our approach reduces impact sparsity by making entity recognition tools more robust across domains, well extracting sentence boundaries. We reduce caused ambiguity employing statistical methods strategically select data. experiments show that using expanding scope results about 8 times extractions, selecting can result an error reduction 30%.