ClausIE

作者: Luciano Del Corro , Rainer Gemulla

DOI: 10.1145/2488388.2488420

关键词: Representation (mathematics)Noisy textInformation extractionGrammarNatural language processingComputer scienceSentenceRelationship extractionNatural languageArtificial intelligenceDependency grammar

摘要: We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs previous approaches in that it separates the detection of ``useful'' pieces expressed sentence representation terms extractions. In more detail, exploits linguistic knowledge about grammar English first detect clauses an input subsequently identify type each clause according grammatical function its constituents. Based on this information, is able generate high-precision extractions; these extractions can be flexibly customized underlying application. based dependency parsing small set domain-independent lexica, operates by without any post-processing, requires no training data (whether labeled or unlabeled). Our experimental study various real-world datasets suggests obtains higher recall precision than existing approaches, both high-quality text as well noisy found web.

参考文章(19)
Pablo Gamallo, Marcos Garcia, Santiago Fernández-Lanza, Dependency-Based Open Information Extraction Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP. pp. 10- 18 ,(2012)
Oren Etzioni, Stephen Soderland, Janara Christensen, Semantic Role Labeling for Open Information Extraction north american chapter of the association for computational linguistics. pp. 52- 60 ,(2010)
Fabian Suchanek, Gerhard Weikum, Ndapandula Nakashole, PATTY: A Taxonomy of Relational Patterns with Semantic Types empirical methods in natural language processing. pp. 1135- 1145 ,(2012)
Michael J. Cafarella, Oren Etzioni, Stephen Soderland, Michele Banko, Matt Broadhead, Open information extraction from the web international joint conference on artificial intelligence. pp. 2670- 2676 ,(2007)
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, Tom Mitchell, None, Toward an architecture for never-ending language learning national conference on artificial intelligence. pp. 1306- 1313 ,(2010)
David Crystal, Randolph Quirk, A Comprehensive Grammar of the English Language ,(1985)
Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Fei Wu, Gengxin Miao, Chung Wu, Recovering semantics of tables on the web Proceedings of the VLDB Endowment. ,vol. 4, pp. 528- 538 ,(2011) , 10.14778/2002938.2002939
Dan Klein, Christopher D. Manning, Accurate unlexicalized parsing Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03. pp. 423- 430 ,(2003) , 10.3115/1075096.1075150
Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, Open Language Learning for Information Extraction empirical methods in natural language processing. pp. 523- 534 ,(2012)
Thomas Lin, Mausam, Oren Etzioni, No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities empirical methods in natural language processing. pp. 893- 903 ,(2012)