Enriching Scientific Publications from LOD Repositories Through Word Embeddings Approach

作者: Arben Hajra , Klaus Tochtermann

DOI: 10.1007/978-3-319-49157-8_24

关键词: Linked dataSemantic WebWord embeddingInformation retrievalComputer scienceCosine similarityInteroperabilityDigital libraryWord lists by frequencyWord2vec

摘要: The era of digitalization is increasingly emphasizing the role Digital Libraries (DL), by increasing requirements and expectations services provided them. interoperability among repositories other resources continues to be a subject research in field. Retrieving publications related particular topic from different DLs, especially diverse domains, require several clicks online visits many points access. However, achieving cross-linking publications, authors data would facilitate scholarly communication general. Starting single point, scholar able find i.e., authors, previously enriched with information repositories. Repositories available as semantic web content, such bibliographic Linked Open Data (LOD) datasets are focus this study. Primarily, we consider existing alignments concepts between Improvements regarding measurements relatedness possible application text-mining techniques. paper introduces preliminary experiments conducted vector space models through TF-IDF Cosine Similarity (CS). Additionally, discusses applying word embedding approach, which focusing mainly on context distributed representations, instead frequency, weighting string matching. We apply contemporary Word2Vec model similar deep learning approach representations.

参考文章(32)
Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, Jean-Luc Gauvain, Neural Probabilistic Language Models Innovations in Machine Learning. ,vol. 194, pp. 137- 186 ,(2006) , 10.1007/3-540-33486-6_6
Kilian Weinberger, Matt Kusner, Nicholas Kolkin, Yu Sun, From Word Embeddings To Document Distances international conference on machine learning. pp. 957- 966 ,(2015)
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
Omer Levy, Yoav Goldberg, Ido Dagan, Improving Distributional Similarity with Lessons Learned from Word Embeddings Transactions of the Association for Computational Linguistics. ,vol. 3, pp. 211- 225 ,(2015) , 10.1162/TACL_A_00134
Christine L. Borgman, Challenges in Building Digital Libraries for the 21st Century international conference on asian digital libraries. pp. 1- 13 ,(2002) , 10.1007/3-540-36227-4_1
Rémi Lebret, Ronan Collobert, Rehabilitation of Count-based Models for Word Vector Representations conference on intelligent text processing and computational linguistics. ,vol. 9041, pp. 417- 429 ,(2015) , 10.1007/978-3-319-18111-0_31
Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen, A graph-based recommender system for digital library acm/ieee joint conference on digital libraries. pp. 65- 73 ,(2002) , 10.1145/544220.544231
Andreas Paepcke, Chen-Chuan K. Chang, Terry Winograd, Héctor García-Molina, Interoperability for digital libraries worldwide Communications of The ACM. ,vol. 41, pp. 33- 42 ,(1998) , 10.1145/273035.273044
J. Bobadilla, F. Ortega, A. Hernando, A. Gutiérrez, Recommender systems survey Knowledge Based Systems. ,vol. 46, pp. 109- 132 ,(2013) , 10.1016/J.KNOSYS.2013.03.012