Measuring Relatedness Between Scientific Entities in Annotation Datasets

作者: Guillermo Palma , Maria-Esther Vidal , Eric Haag , Louiqa Raschid , Andreas Thor

DOI: 10.1145/2506583.2506651

关键词: Linked dataENCODEGraph (abstract data type)Computer scienceExploitSemantic similarityControlled vocabularyOntology (information science)Information retrievalAnnotation

摘要: Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode knowledge which is captured annotation datasets. One can mine these to discover relationships and patterns between entities. Determining relatedness (or similarity) becomes building block for graph pattern mining, e.g., identifying drug-drug could depend on similarity diseases (conditions) that are associated each drug. Diverse metrics been proposed literature, i) string-similarity metrics; ii) path-similarity iii) topological-similarity all measure given taxonomy or ontology. In this paper, we consider novel metric AnnSim measures two their annotations. We model as 1-to-1 maximal weighted bipartite match, exploit properties existing solvers provide an efficient solution. empirically study effectiveness real-world genes GO annotations, clinical trials, human disease benchmark. Our results suggest deeper understanding concepts explanation potential patterns.

参考文章(26)
Harish Karnick, Sumit Bhagwani, Shrutiranjan Satapathy, Semantic textual similarity using maximal weighted bipartite graph matching joint conference on lexical and computational semantics. pp. 579- 585 ,(2012)
Joseph Benik, Caren Chang, Louiqa Raschid, Maria-Esther Vidal, Guillermo Palma, Andreas Thor, Finding Cross Genome Patterns in Annotation Graphs Lecture Notes in Computer Science. pp. 21- 36 ,(2012) , 10.1007/978-3-642-31040-9_3
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu, PathSim Proceedings of the VLDB Endowment. ,vol. 4, pp. 992- 1003 ,(2011) , 10.14778/3402707.3402736
Toralf Kirsten, Erhard Rahm, Andreas Thor, Instance-based matching of hierarchical ontologies. BTW. pp. 436- 448 ,(2007)
Serguei V.S. Pakhomov, Ted Pedersen, Bridget T. McInnes, UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity. american medical informatics association annual symposium. ,vol. 2009, pp. 431- 435 ,(2009)
Genevieve B. Melton, Serguei Pakhomov, Ted Pedersen, Bridget McInnes, Terrence Adam, Ying Liu, Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study american medical informatics association annual symposium. ,vol. 2010, pp. 572- 576 ,(2010)
Dekang Lin, An Information-Theoretic Definition of Similarity international conference on machine learning. pp. 296- 304 ,(1998)
Michael A. Bender, Martín Farach-Colton, Giridhar Pemmasani, Steven Skiena, Pavel Sumazin, Lowest common ancestors in trees and directed acyclic graphs Journal of Algorithms. ,vol. 57, pp. 75- 94 ,(2005) , 10.1016/J.JALGOR.2005.08.001
David Aumueller, Hong-Hai Do, Sabine Massmann, Erhard Rahm, Schema and ontology matching with COMA++ Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 906- 908 ,(2005) , 10.1145/1066157.1066283