作者: Tomáš Brychcín , Lukáš Svoboda
DOI: 10.14311/2018/NNW.2018.28.029
关键词:
摘要: In this paper we extend Skip-Gram and Continuous Bag-of-Words Distributional word representations models via global context information. We use a corpus extracted from Wikipedia, where articles are organized in hierarchy of categories. These categories provide useful topical information about each article. present the four new approaches, how to enrich meaning representation with such experiment English Wikipedia evaluate our on standard similarity analogy datasets. Proposed significantly outperform other methods when similar size training data is used performance compared trained much larger Our approach shows, that increasing amount unlabelled does not necessarily increase embeddings as introducing or sub-word information, especially time taken into consideration.