Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases

作者: Yan Zhuang , Guoliang Li , Zhuojian Zhong , Jianhua Feng

DOI: 10.1145/3132847.3132912

关键词:

摘要: With the vigorous development of World Wide Web, many large-scale knowledge bases (KBs) have been generated. To improve coverage KBs, an important task is to integrate heterogeneous KBs. Several automatic alignment methods proposed which achieve considerable success. However, due inconsistency and uncertainty techniques for KBs low quality (especially recall). Thanks open crowdsourcing platforms, we can harness crowd quality. this goal, in paper propose a novel hybrid human-machine framework KB integration. We rst partition entities different into smaller blocks based on their relations. then construct partial order these partitions develop inference model crowdsources set tasks infers answers other crowdsourced tasks. Next formulate question selection problem, which, given monetary budget B, selects B maximize number inferred prove that problem NP-hard greedy algorithms address with approximation ratio 1--1/e. Our experiments real-world datasets indicate our method improves outperforms state-of-the-art approaches.

参考文章(42)
Cristina Sarasua, Elena Simperl, Natalya F Noy, None, CrowdMap: crowdsourcing ontology alignment with microtasks international semantic web conference. pp. 525- 541 ,(2012) , 10.1007/978-3-642-35176-1_33
Chien-Ju Ho, Jennifer Wortman Vaughan, Shahin Jabbari, Adaptive Task Assignment for Crowdsourced Classification international conference on machine learning. pp. 534- 542 ,(2013)
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134
Fabian M. Suchanek, Serge Abiteboul, Pierre Senellart, PARIS Proceedings of the VLDB Endowment. ,vol. 5, pp. 157- 168 ,(2011) , 10.14778/2078331.2078332
G. L. Nemhauser, L. A. Wolsey, M. L. Fisher, An analysis of approximations for maximizing submodular set functions--I Mathematical Programming. ,vol. 14, pp. 265- 294 ,(1978) , 10.1007/BF01588971
Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani, None, SIGMa: simple greedy matching for aligning large knowledge bases knowledge discovery and data mining. pp. 572- 580 ,(2013) , 10.1145/2487575.2487592
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng, iCrowd: An Adaptive Crowdsourcing Framework international conference on management of data. pp. 1015- 1030 ,(2015) , 10.1145/2723372.2750550
Noryusliza Abdullah, Rosziati Ibrahim, Knowledge retrieval in lexical ontology-based semantic web search engine Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication - ICUIMC '13. pp. 8- ,(2013) , 10.1145/2448556.2448564
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang, A hybrid machine-crowdsourcing system for matching web tables international conference on data engineering. pp. 976- 987 ,(2014) , 10.1109/ICDE.2014.6816716