作者: Wei Hu , Jianfeng Chen , Yuzhong Qu
关键词: Information retrieval 、 Coreference 、 Computer science 、 Equivalence (formal languages) 、 Discriminative model 、 Precision and recall 、 Inference 、 Semantic Web 、 Pragmatics
摘要: An object on the Semantic Web is likely to be denoted with multiple URIs by different parties. Object coreference resolution identify "equivalent" that denote same object. Driven Linking Open Data (LOD) initiative, millions of have been explicitly linked owl:sameAs statements, but potentially coreferent ones are still considerable. Existing approaches address problem mainly from two directions: one based upon equivalence inference mandated OWL semantics, which finds semantically probably omits many potential ones; other via similarity computation between property-value pairs, not always accurate enough. In this paper, we propose a self-training approach for Web, leverages classes bridge gap and candidates. For an URI, firstly establish kernel consists owl:sameAs, (inverse) functional properties (max-)cardinalities, then extend such iteratively in terms discriminative pairs descriptions URIs. particular, discriminability learnt statistical measurement, only exploits key characteristics representing object, also takes into account matchability pragmatics. addition, frequent property combinations mined improve accuracy resolution. We implement scalable system demonstrate our achieves good precision recall resolving coreference, both benchmark large-scale datasets.