Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?

作者: Janus Wawrzinek , Wolf-Tilo Balke

DOI: 10.1007/978-3-030-04257-8_2

关键词: Space (commercial competition)Digital libraryData scienceMeaning (linguistics)Field (computer science)Information extractionTerm (time)Artificial neural networkComputer scienceScalability

摘要: The exponential increase of scientific publications in the medical field urgently calls for innovative access paths beyond limits a term-based search. As an example, search term “diabetes” leads to result over 600,000 digital library PubMed. In such cases, automatic extraction semantic relations between important entities like active substances, diseases, and genes can help reveal entity-relationships thus allow simplified knowledge embedded libraries. On other hand, semantic-relation tasks distributional embedding models based on neural networks promise considerable progress terms accuracy, performance scalability. Yet, despite recent successes this field, questions arise related their non-deterministic nature: Are meaningful, perhaps even new unknown entity-relationships? paper, we address question by measuring associations pharmaceutical as substances (drugs) diseases high-dimensional space. our investigation, show that while one hand only few contextualized directly correlate with spatial distance, have discovered potential predicting associations, which makes method suitable new, literature-based technique practical e.g., drug repurposing.

参考文章(19)
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
Pankaj Agarwal, David B. Searls, Can literature analysis identify innovation drivers in drug discovery? Nature Reviews Drug Discovery. ,vol. 8, pp. 865- 878 ,(2009) , 10.1038/NRD2973
Lars Juhl Jensen, Jasmin Saric, Peer Bork, Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics. ,vol. 7, pp. 119- 129 ,(2006) , 10.1038/NRG1768
A P Chiang, A J Butte, Systematic Evaluation of Drug–Disease Relationships to Identify Leads for Novel Drug Uses Clinical Pharmacology & Therapeutics. ,vol. 86, pp. 507- 510 ,(2009) , 10.1038/CLPT.2009.103
George A. Miller, WordNet Communications of the ACM. ,vol. 38, pp. 39- 41 ,(1995) , 10.1145/219717.219748
George A. Miller, Walter G. Charles, Contextual correlates of semantic similarity Language and Cognitive Processes. ,vol. 6, pp. 1- 28 ,(1991) , 10.1080/01690969108406936
Michael J Keiser, Vincent Setola, John J Irwin, Christian Laggner, Atheir I Abbas, Sandra J Hufeisen, Niels H Jensen, Michael B Kuijer, Roberto C Matos, Thuy B Tran, Ryan Whaley, Richard A Glennon, Jérôme Hert, Kelan LH Thomas, Douglas D Edwards, Brian K Shoichet, Bryan L Roth, None, Predicting new molecular targets for known drugs Nature. ,vol. 462, pp. 175- 181 ,(2009) , 10.1038/NATURE08506
Tomas Mikolov, Geoffrey Zweig, Wen-tau Yih, Linguistic Regularities in Continuous Space Word Representations north american chapter of the association for computational linguistics. pp. 746- 751 ,(2013)
Robert Leaman, Rezarta Islamaj Doğan, Zhiyong Lu, DNorm: disease name normalization with pairwise learning to rank. Bioinformatics. ,vol. 29, pp. 2909- 2917 ,(2013) , 10.1093/BIOINFORMATICS/BTT474