作者: Magnus Sahlgren , Jussi Karlgren , Jon Holmlund
DOI:
关键词: Lexicon 、 Information retrieval 、 Vector space 、 Natural language processing 、 Artificial intelligence 、 Computer science 、 Semantic vector 、 Word list 、 Gold standard (test)
摘要: This paper proposes a novel method for automatically acquiring multi-lingual lexica from non-parallel data and reports some initial experiments to prove the viability of approach. Using established techniques building mono-lingual vector spaces two independent semantic vector spaces are built textual data. These vector are related each other using small {\em reference word list} manually chosen points taken from available bi-lingual dictionaries. Other words can then be related these first in one language in other. In present experiments, we apply proposed comparable but non-parallel English-German The resulting lexicon is evaluated an online as gold standard. results clearly demonstrate methodology.