作者: Luciano Del Corro , Rainer Gemulla
关键词: Representation (mathematics) 、 Noisy text 、 Information extraction 、 Grammar 、 Natural language processing 、 Computer science 、 Sentence 、 Relationship extraction 、 Natural language 、 Artificial intelligence 、 Dependency grammar
摘要: We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs previous approaches in that it separates the detection of ``useful'' pieces expressed sentence representation terms extractions. In more detail, exploits linguistic knowledge about grammar English first detect clauses an input subsequently identify type each clause according grammatical function its constituents. Based on this information, is able generate high-precision extractions; these extractions can be flexibly customized underlying application. based dependency parsing small set domain-independent lexica, operates by without any post-processing, requires no training data (whether labeled or unlabeled). Our experimental study various real-world datasets suggests obtains higher recall precision than existing approaches, both high-quality text as well noisy found web.