BiobankConnect : software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing

作者: Chao Pang , Dennis Hendriksen , Martijn Dijkstra , K Joeri van der Velde , Joel Kuiper

DOI: 10.1136/AMIAJNL-2013-002577

关键词: Data integrationSet (abstract data type)Search engine indexingRank (computer programming)Matching (statistics)SoftwareData collectionInformation retrievalComputer scienceUser interface

摘要: Objective Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of sources. However, searching for desired elements among thousands available harmonizing differences in terminology, collection, structure, arduous time consuming. Materials methods To speed up biobank pooling we developed BiobankConnect, a system semi-automatically match by: (1) annotating with ontology terms using BioPortal; (2) automatically expanding query these synonyms subclass information OntoCAT; (3) expanded Lucene lexical matching; (4) shortlisting relevant matches sorted by matching score. Results We evaluated BiobankConnect human curated from EU-BioSHaRE, 32 7461 six biobanks. found 0.75 precision at rank 1 0.74 recall 10 compared manually set matches. In addition, best chosen BioSHaRE experts ranked first 63.0% top 98.4% cases, indicating that our has potential significantly reduce manual work. Conclusions provides an easy user interface harmonization process. It may also prove useful other forms biomedical integration. All software can be downloaded as MOLGENIS open source app , demo .

参考文章(21)
Zharko Aleksovski, Michel Klein, Warner ten Kate, Frank van Harmelen, Matching Unstructured Vocabularies Using a Background Ontology Managing Knowledge in a World of Networks. pp. 182- 197 ,(2006) , 10.1007/11891451_18
Fausto Giunchiglia, Aliaksandr Autayeu, Juan Pane, S-match: an open source framework for matching lightweight ontologies Social Work. ,vol. 3, pp. 307- 317 ,(2012) , 10.3233/SW-2011-0036
Belén Díaz-Agudo, Manuel de Buenaga Rodríguez, José María Gómez Hidalgo, Using WordNet to Complement Training Information in Text Categorization arXiv: Computation and Language. pp. 353- ,(1997)
Henrik Oxhammar, Hans Hjelm, Kristina Nilsson, SUiS - Cross-language Ontology-driven Information Retrieval in a Restricted Domain Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005). pp. 139- 145 ,(2006)
Ali Abbasi, Eva Corpeleijn, Linda M. Peelen, Ron T. Gansevoort, Paul E. de Jong, Rijk O. B. Gans, Wolfgang Rathmann, Bernd Kowall, Christine Meisinger, Hans L. Hillege, Ronald P. Stolk, Gerjan Navis, Joline W. J. Beulens, Stephan J. L. Bakker, External validation of the KORA S4/F4 prediction models for the risk of developing type 2 diabetes in older adults: the PREVEND study European Journal of Epidemiology. ,vol. 27, pp. 47- 52 ,(2012) , 10.1007/S10654-011-9648-4
Ming Mao, Yefei Peng, Michael Spring, An adaptive ontology mapping approach with neural network based constraint satisfaction Journal of Web Semantics. ,vol. 8, pp. 14- 25 ,(2010) , 10.1016/J.WEBSEM.2009.11.002
M.C. Díaz-Galiano, M.T Martín-Valdivia, L.A. Ureña-López, Query expansion with a medical ontology to improve a multimodal information retrieval system Computers in Biology and Medicine. ,vol. 39, pp. 396- 403 ,(2009) , 10.1016/J.COMPBIOMED.2009.01.012
D. L. Rubin, N. H. Shah, N. F. Noy, Biomedical ontologies: a functional perspective Briefings in Bioinformatics. ,vol. 9, pp. 75- 90 ,(2007) , 10.1093/BIB/BBM059
Dany Doiron, Paul Burton, Yannick Marcon, Amadou Gaye, Bruce H R Wolffenbuttel, Markus Perola, Ronald P Stolk, Luisa Foco, Cosetta Minelli, Melanie Waldenberger, Rolf Holle, Kirsti Kvaløy, Hans L Hillege, Anne-Marie Tassé, Vincent Ferretti, Isabel Fortier, Data harmonization and federated analysis of population-based studies: the BioSHaRE project. Emerging Themes in Epidemiology. ,vol. 10, pp. 12- 12 ,(2013) , 10.1186/1742-7622-10-12
S Asburner, CA Ball, JA Blake, D Botstein, H Butler, JM Cherry, AP Davis, K Dolinski, SS Dwight, JT Eppig, MA Harris, DP Hill, L Issel‐Tarver, A Kasarskis, S Lewis, JC Matese, JE Richardson, M Ringwald, GM Rubin, G Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. ,vol. 25, pp. 25- 29 ,(2000) , 10.1038/75556