作者: Amna Dridi , Mohamed Medhat Gaber , R. Muhammad Atif Azad , Jagdev Bhogal
DOI: 10.1007/978-3-030-01771-2_21
关键词:
摘要: Word embeddings are increasingly attracting the attention of researchers dealing with semantic similarity and analogy tasks. However, finding optimal hyper-parameters remains an important challenge due to resulting impact on revealed analogies mainly for domain-specific corpora. While highly used hypotheses synthesis, it is crucial optimise word embedding precise hypothesis synthesis. Therefore, we propose, in this paper, a methodological approach tuning by using stability k-nearest neighbors vectors within scientific corpora more specifically Computer Science Machine learning adopted as case study. This tested dataset created from NIPS (Conference Neural Information Processing Systems) publications, evaluated curated ACM hierarchy Wikipedia Learning outline gold standard. Our quantitative qualitative analysis indicate that our not only reliably captures interesting patterns like “unsupervised_learning kmeans supervised_learning knn”, but also analogical structure consistently outperforms \(61\%\) sate-of-the-art syntactic accuracy \(68\%\).