Automatically generating data linkages using a domain-independent candidate selection approach

作者: Dezhao Song , Jeff Heflin

DOI: 10.1007/978-3-642-25073-6_41

关键词:

摘要: One challenge for Linked Data is scalably establishing highquality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional approaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In paper, we propose a candidate selection algorithm pruning the search space coreference. We select instance pairs by computing character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. index predicates' efficiently look up similar evaluate our approach two RDF and three structured datasets. show traditional metrics don't always accurately reflect relative benefits selection, additional metrics. frequently outperforms alternatives able process 1 million under one hour single Sun Workstation. Furthermore, datasets, entire scales well applying technique. Surprisingly, high recall, low precision filtering mechanism leads higher F-scores overall system.

参考文章(24)
Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, Cássia Trojahn, Ontology Alignment Evaluation Initiative: Six Years of Experience Lecture Notes in Computer Science. ,vol. 15, pp. 158- 192 ,(2011) , 10.1007/978-3-642-22630-4_6
Lifang Gu, Rohan A. Baxter, Adaptive Filtering for Efficient Record Linkage. siam international conference on data mining. pp. 477- 481 ,(2004)
Joseph Hassell, Boanerges Aleman-Meza, I. Budak Arpinar, Ontology-driven automatic entity disambiguation in unstructured text international semantic web conference. pp. 44- 57 ,(2006) , 10.1007/11926078_4
Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov, Discovering and Maintaining Links on the Web of Data international semantic web conference. ,vol. 5823, pp. 650- 665 ,(2009) , 10.1007/978-3-642-04930-9_41
Niraj Aswani, Kalina Bontcheva, Hamish Cunningham, Mining information for instance unification international semantic web conference. pp. 329- 342 ,(2006) , 10.1007/11926078_24
Craig A. Knoblock, Matthew Michelson, Learning blocking schemes for record linkage national conference on artificial intelligence. pp. 440- 445 ,(2006)
Hugh Glaser, Ian C. Millard, Afraz Jaffri, RKBExplorer.com: A Knowledge Driven Infrastructure for Linked Data Providers Lecture Notes in Computer Science. pp. 797- 801 ,(2008) , 10.1007/978-3-540-68234-9_61
Yong Yu, Pei Yue, Zhiyuan Chen, Jiamin Zhu, Yunbo Cao, Chin-Yew Lin, Leveraging unlabeled data to scale blocking for record linkage international joint conference on artificial intelligence. pp. 2211- 2217 ,(2011) , 10.5591/978-1-57735-516-8/IJCAI11-369
Christian Bizer, Tom Heath, Tim Berners-Lee, Linked Data - the story so far International Journal on Semantic Web and Information Systems. ,vol. 5, pp. 1- 22 ,(2009) , 10.4018/JSWIS.2009081901
Dezhao Song, Jeff Heflin, Domain-independent entity coreference in RDF graphs Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10. pp. 1821- 1824 ,(2010) , 10.1145/1871437.1871738