作者: Shusen Liu , Peer-Timo Bremer , Jayaraman J. Thiagarajan , Vivek Srikumar , Bei Wang
DOI: 10.1109/TVCG.2017.2745141
关键词:
摘要: Constructing distributed representations for words through neural language models and using the resulting vector spaces analysis has become a crucial component of natural processing (NLP). However, despite their widespread application, little is known about structure properties these spaces. To gain insights into relationship between words, NLP community begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) principal (PCA) create two-dimensional assessing overall exploring linear relationships (e.g., word analogies), respectively. Unfortunately, techniques often produce mediocre or even misleading results cannot address domain-specific challenges that are understanding semantic in embeddings. Here, we introduce new embedding visualizing syntactic analogies, corresponding tests determine whether views capture salient structures. Additionally, two novel comprehensive study analogy relationships. Finally, augment t-SNE convey uncertainty information order allow reliable interpretation. Combined, different number tasks difficult solve with existing tools.