作者: Chao Pang , Dennis Hendriksen , Martijn Dijkstra , K Joeri van der Velde , Joel Kuiper
DOI: 10.1136/AMIAJNL-2013-002577
关键词: Data integration 、 Set (abstract data type) 、 Search engine indexing 、 Rank (computer programming) 、 Matching (statistics) 、 Software 、 Data collection 、 Information retrieval 、 Computer science 、 User interface
摘要: Objective Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of sources. However, searching for desired elements among thousands available harmonizing differences in terminology, collection, structure, arduous time consuming. Materials methods To speed up biobank pooling we developed BiobankConnect, a system semi-automatically match by: (1) annotating with ontology terms using BioPortal; (2) automatically expanding query these synonyms subclass information OntoCAT; (3) expanded Lucene lexical matching; (4) shortlisting relevant matches sorted by matching score. Results We evaluated BiobankConnect human curated from EU-BioSHaRE, 32 7461 six biobanks. found 0.75 precision at rank 1 0.74 recall 10 compared manually set matches. In addition, best chosen BioSHaRE experts ranked first 63.0% top 98.4% cases, indicating that our has potential significantly reduce manual work. Conclusions provides an easy user interface harmonization process. It may also prove useful other forms biomedical integration. All software can be downloaded as MOLGENIS open source app , demo .