Relation Extraction from the Web Using Distant Supervision

作者: Isabelle Augenstein , Diana Maynard , Fabio Ciravegna

DOI: 10.1007/978-3-319-13704-9_3

关键词:

摘要: Extracting information from Web pages requires the ability to work at scale in terms of number documents, domains and domain complexity. Recent approaches have used existing knowledge bases learn extract with promising results. In this paper we propose use distant supervision for relation extraction Web. Distant is a method which uses background Linking Open Data cloud automatically label sentences relations create training data classifiers. Although promising, are still not suitable as they suffer three main issues: sparsity, noise lexical ambiguity. Our approach reduces impact sparsity by making entity recognition tools more robust across domains, well extracting sentence boundaries. We reduce caused ambiguity employing statistical methods strategically select data. experiments show that using expanding scope results about 8 times extractions, selecting can result an error reduction 30%.

参考文章(38)
Daniel Gerber, Axel-Cyrille Ngonga Ngomo, Extracting multilingual natural-language patterns for RDF predicates knowledge acquisition, modeling and management. pp. 87- 96 ,(2012) , 10.1007/978-3-642-33876-2_10
Roland Roller, Mark Stevenson, Self-supervised Relation Extraction Using UMLS cross language evaluation forum. pp. 116- 127 ,(2014) , 10.1007/978-3-319-11382-1_12
Isabelle Augenstein, Sebastian Padó, Sebastian Rudolph, LODifier: Generating Linked Data from Unstructured Text Lecture Notes in Computer Science. pp. 210- 224 ,(2012) , 10.1007/978-3-642-30284-8_21
Mihai Surdeanu, Ramesh Nallapati, Julie Tibshirani, Christopher D. Manning, Multi-instance Multi-label Learning for Relation Extraction empirical methods in natural language processing. pp. 455- 465 ,(2012)
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, Tom Mitchell, None, Toward an architecture for never-ending language learning national conference on artificial intelligence. pp. 1306- 1313 ,(2010)
Luciano Del Corro, Rainer Gemulla, ClausIE Proceedings of the 22nd international conference on World Wide Web - WWW '13. pp. 355- 366 ,(2013) , 10.1145/2488388.2488420
Valentina Presutti, Francesco Draicchio, Aldo Gangemi, Knowledge extraction based on discourse representation theory and linguistic frames knowledge acquisition, modeling and management. ,vol. 7603, pp. 114- 129 ,(2012) , 10.1007/978-3-642-33876-2_12
Sebastian Riedel, Limin Yao, Andrew McCallum, Modeling relations and their mentions without labeled text european conference on machine learning. pp. 148- 163 ,(2010) , 10.1007/978-3-642-15939-8_10
Benjamin M. Marlin, Sebastian Riedel, Andrew McCallum, Limin Yao, Relation Extraction with Matrix Factorization and Universal Schemas north american chapter of the association for computational linguistics. pp. 74- 84 ,(2013)
Tarek Abudawood, Peter Flach, Evaluation Measures for Multi-class Subgroup Discovery european conference on machine learning. pp. 35- 50 ,(2009) , 10.1007/978-3-642-04180-8_20