Combining information extraction and human computing for crowdsourced knowledge acquisition

作者: Sarath Kumar Kondreddi , Peter Triantafillou , Gerhard Weikum

DOI: 10.1109/ICDE.2014.6816717

关键词:

摘要: Automatic information extraction (IE) enables the construction of very large knowledge bases (KBs), with relational facts on millions entities from text corpora and Web sources. However, such KBs contain errors they are far being complete. This motivates need for exploiting human intelligence using crowd-based computing (HC) assessing validity gathering additional knowledge. paper presents a novel system architecture, called Higgins, which shows how to effectively integrate an IE engine HC engine. Higgins generates game questions where players choose or fill in missing relations subject-relation-object triples. For generating multiple-choice answer candidates, we have constructed dictionary entity names phrases, developed specifically designed statistical language models phrase relatedness. To this end, combine semantic resources like WordNet, ConceptNet, others statistics derived largeWeb corpus. We demonstrate effectiveness acquisition by crowdsourced relationships between characters narrative descriptions movies books.

参考文章(55)
Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives, Sören Auer, Christian Bizer, DBpedia: a nucleus for a web of open data international semantic web conference. ,vol. 4825, pp. 722- 735 ,(2007) , 10.1007/978-3-540-76298-0_52
Cristina Sarasua, Elena Simperl, Natalya F Noy, None, CrowdMap: crowdsourcing ontology alignment with microtasks international semantic web conference. pp. 525- 541 ,(2012) , 10.1007/978-3-642-35176-1_33
Feng Niu, Ce Zhang, Christopher Ré, Jude W Shavlik, None, DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference VLDS. pp. 25- 28 ,(2012)
Fabian Suchanek, Gerhard Weikum, Ndapandula Nakashole, PATTY: A Taxonomy of Relational Patterns with Semantic Types empirical methods in natural language processing. pp. 1135- 1145 ,(2012)
Michael J. Cafarella, Oren Etzioni, Stephen Soderland, Michele Banko, Matt Broadhead, Open information extraction from the web international joint conference on artificial intelligence. pp. 2670- 2676 ,(2007)
Omar Alonso, Ricardo Baeza-Yates, Design and Implementation of Relevance Assessments Using Crowdsourcing Lecture Notes in Computer Science. pp. 153- 164 ,(2011) , 10.1007/978-3-642-20161-5_16
Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses language resources and evaluation. pp. 449- 454 ,(2006)
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, Tom Mitchell, None, Toward an architecture for never-ending language learning national conference on artificial intelligence. pp. 1306- 1313 ,(2010)
Robert C. Miller, Samuel R. Madden, Eugene Wu, Adam Marcus, David R. Karger, Crowdsourced Databases: Query Processing with People conference on innovative data systems research. pp. 211- 214 ,(2011)
John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach ,(1989)