作者: Lucas Rettenmeier
DOI:
关键词:
摘要: Word embeddings are computed by a class of techniques within natural language processing (NLP), that create continuous vector representations words in from large text corpus. The stochastic nature the training process most embedding can lead to surprisingly strong instability, i.e. subsequently applying same technique data twice, produce entirely different results. In this work, we present an experimental study on instability three influential last decade: word2vec, GloVe and fastText. Based results, propose statistical model describe introduce novel metric measure representation individual word. Finally, method minimize - computing modified average over multiple runs apply it specific linguistic problem: detection quantification semantic change, measuring changes meaning usage time.