作者: Stephen Joseph Wilson , Angela Dawn Wilkins , Matthew V. Holt , Byung Kwon Choi , Daniel Konecki
DOI: 10.1101/403667
关键词:
摘要: ABSTRACT The scientific literature is vast, growing, and increasingly specialized, making it difficult to connect disparate observations across subfields. To address this problem, we sought develop automated hypothesis generation by networking at scale the MeSH terms curated National Library of Medicine. result a Mesh Term Objective Reasoning (MeTeOR) approach that tallies associations among genes, drugs diseases from PubMed predicts new ones. Comparisons reference databases algorithms show MeTeOR tends be more reliable. We also many predictions based on prior 2014 were published subsequently. In practical application, validated experimentally surprising association found between novel Epidermal Growth Factor Receptor (EGFR) CDK2. conclude generates useful hypotheses (http://meteor.lichtargelab.org/). AUTHOR SUMMARY large size exponential expansion forms bottleneck accessing understanding findings. Manual curation Natural Language Processing (NLP) aim summarizing disseminating knowledge within articles as key relationships (e.g. TP53 relates Cancer). However, these methods compromise either coverage or accuracy, respectively. mitigate compromise, proposed using manually-assigned keywords (MeSH terms) extract publications demonstrated comparable but higher accuracy than current NLP methods. Furthermore, combined extracted with semi-supervised machine learning create guide future work discovered direct interaction two important cancer genes.