Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases

作者: Zhaoan Dong , Ju Fan , Jiaheng Lu , Xiaoyong Du , Tok Wang Ling

DOI: 10.1007/978-3-319-96893-3_19

关键词: EmbeddingComputer scienceCrowdsourcingMachine learningKnowledge baseType (model theory)Selection (linguistics)Artificial intelligenceType inferenceEntity type

摘要: Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings more likely to be incomplete, as they difficult obtained. Existing machine-based algorithms use predicates birthPlace) infer their missing types, limitations that may insufficient types. In this paper, we utilize crowdsourcing solve problem, address challenge controlling cost. To end, propose a hybrid machine-crowdsourcing approach for entity completion. It firstly determines “representative” via then infers remaining based on results. support approach, first an embedding-based influence inference which considers not only distance between embeddings but also distances embeddings. Second, new difficulty model selection can better capture uncertainty machine algorithm when identifying We demonstrate effectiveness our through experiments real platforms. The results show method outperforms state-of-the-art by improving completion at affordable

参考文章(23)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Alessio Palmero Aprosio, Claudio Giuliano, Alberto Lavelli, Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information extended semantic web conference. pp. 397- 411 ,(2013) , 10.1007/978-3-642-38288-8_27
Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, Paolo Ciancarini, Automatic typing of DBpedia entities international semantic web conference. pp. 65- 81 ,(2012) , 10.1007/978-3-642-35176-1_5
Heiko Paulheim, Christian Bizer, Type Inference on Noisy RDF Data international semantic web conference. pp. 510- 525 ,(2013) , 10.1007/978-3-642-41335-3_32
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134
Feiran Huang, Jia Li, Jiaheng Lu, Tok Wang Ling, Zhaoan Dong, PandaSearch: A fine-grained academic search engine for research documents international conference on data engineering. pp. 1408- 1411 ,(2015) , 10.1109/ICDE.2015.7113388
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang, A hybrid machine-crowdsourcing system for matching web tables international conference on data engineering. pp. 976- 987 ,(2014) , 10.1109/ICDE.2014.6816716
Christoph Lofi, Kinda El Maarry, Design Patterns for Hybrid Algorithmic-Crowdsourcing Workflows ieee conference on business informatics. ,vol. 1, pp. 1- 8 ,(2014) , 10.1109/CBI.2014.16
Heiko Paulheim, Christian Bizer, Improving the Quality of Linked Data Using Statistical Distributions International Journal on Semantic Web and Information Systems. ,vol. 10, pp. 63- 86 ,(2014) , 10.4018/IJSWIS.2014040104
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor, Freebase Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08. pp. 1247- 1250 ,(2008) , 10.1145/1376616.1376746