DOI: 10.22028/D291-26564
关键词: Knowledge acquisition 、 Crowdsourcing 、 Language model 、 Unstructured data 、 Information extraction 、 Knowledge extraction 、 Natural language 、 Information retrieval 、 Data science 、 WordNet 、 Computer science
摘要: Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through inputs. While textbased acquisition can benefit this approach, humans alone cannot bear burden vast resources that exist today. Even making payments crowdsourced quickly become prohibitively expensive. In thesis we present principled effectively garner computing inputs improving extraction knowledge-base facts texts. Our complement automatic techniques with reap benefits both while overcoming each other’s limitations. We architecture implementation HIGGINS , system combines an information (IE) engine (HC) high quality facts. Using methods, IE compiles dictionaries entity names relational phrases. It further statistics derived large Web corpora semantic like WordNet ConceptNet expand dictionary employs specifically designed statistical models phrase relatedness come up questions relevant candidate answers presented workers. Through extensive experiments establish superiority approach relation-centric text. our extract about fictitious characters narrative text, where issues complexity expressing relations far more pronounced. Finally, also demonstrate how interesting games be tasks.