作者: Dezhao Song , Jeff Heflin
DOI: 10.1007/978-3-642-25073-6_41
关键词:
摘要: One challenge for Linked Data is scalably establishing highquality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional approaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In paper, we propose a candidate selection algorithm pruning the search space coreference. We select instance pairs by computing character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. index predicates' efficiently look up similar evaluate our approach two RDF and three structured datasets. show traditional metrics don't always accurately reflect relative benefits selection, additional metrics. frequently outperforms alternatives able process 1 million under one hour single Sun Workstation. Furthermore, datasets, entire scales well applying technique. Surprisingly, high recall, low precision filtering mechanism leads higher F-scores overall system.