作者: Guillermo Palma , Maria-Esther Vidal , Eric Haag , Louiqa Raschid , Andreas Thor
关键词: Linked data 、 ENCODE 、 Graph (abstract data type) 、 Computer science 、 Exploit 、 Semantic similarity 、 Controlled vocabulary 、 Ontology (information science) 、 Information retrieval 、 Annotation
摘要: Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode knowledge which is captured annotation datasets. One can mine these to discover relationships and patterns between entities. Determining relatedness (or similarity) becomes building block for graph pattern mining, e.g., identifying drug-drug could depend on similarity diseases (conditions) that are associated each drug. Diverse metrics been proposed literature, i) string-similarity metrics; ii) path-similarity iii) topological-similarity all measure given taxonomy or ontology. In this paper, we consider novel metric AnnSim measures two their annotations. We model as 1-to-1 maximal weighted bipartite match, exploit properties existing solvers provide an efficient solution. empirically study effectiveness real-world genes GO annotations, clinical trials, human disease benchmark. Our results suggest deeper understanding concepts explanation potential patterns.