作者: Satya Almasian , Andreas Spitz , Michael Gertz
DOI: 10.1007/978-3-030-15712-8_20
关键词:
摘要: Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability capture lexical semantics. However, while such involve or even rely on named entities as central components, popular word embedding models have so far failed include first-class citizens. While it seems intuitive that annotating in the training corpus should result more intelligent features downstream tasks, performance issues arise when approaches naively applied entity annotated corpora. Not only resulting embeddings less than expected, but one also finds non-entity degrades comparison those trained raw, unannotated corpus. In this paper, we investigate jointly train a large with automatically linked entities. We discuss two distinct generation embeddings, namely state-of-the-art raw-text versions corpus, well node co-occurrence graph representation compare classical variety similarity, analogy, clustering evaluation entity-specific tasks. Our findings show takes an create acceptable common test cases. Based these results, how text can restore performance.