作者: Stephen Taylor , Tomáš Brychcin
DOI:
关键词:
摘要: Since Mikolov introduced word analogies as an example of semantic composition by vector addition, they have inspired both enthusiasm and disdain. If the arithmetic computation works, the relationship encoded in the word vectors should manifest itself as parallel difference vectors, and if the difference vectors are parallel, this should appear in two-dimensional projections. For Principal Component Analysis (PCA) bases computed on just the words of a relation’s pairs, this seems to be true. However, PCA on larger subsets of the vocabulary typically shows a wide range of directions for difference vectors in the same relation. The PCA phenomenon is evidence for our suggestion that there is a subspace for each relation, in which the difference vectors are parallel. That is, only a subset of the semantic information for each word participates in the relation. To approximate such a subspace, we train a linear transformation which moves a portion of the pairs in a relation so that the difference vectors are nearly parallel to each other, while minimizing the movement of unrelated words. We see that there is a net improvement in evaluating not only analogies which include pairs in the training set, but also analogies between held-out pairs in the same relation. The trained transformation thus seems to isolate semantic components expressed by the relation.