Creating Bilingual Lexica Using Reference Wordlists for Alignment of Monolingual Semantic Vector Spaces

作者: Magnus Sahlgren , Jussi Karlgren , Jon Holmlund

DOI:

关键词: LexiconInformation retrievalVector spaceNatural language processingArtificial intelligenceComputer scienceSemantic vectorWord listGold standard (test)

摘要: This paper proposes a novel method for automatically acquiring multi-lingual lexica from non-parallel data and reports some initial experiments to prove the viability of approach. Using established techniques building mono-lingual vector spaces two independent semantic vector spaces are built textual data. These vector are related each other using small {\em reference word list} manually chosen points taken from available bi-lingual dictionaries. Other words can then be related these first in one language in other. In present experiments, we apply proposed comparable but non-parallel English-German The resulting lexicon is evaluated an online as gold standard. results clearly demonstrate methodology.

参考文章(5)
Jan Kristoferson, Pentti Kanerva, Anders Holst, Random indexing of text samples for latent semantic analysis conference cognitive science. ,vol. 22, ,(2000)
Magnus Sahlgren, Jussi Karlgren, From Words to Understanding CSLI Publications. pp. 294- 308 ,(2001)
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, Indexing by Latent Semantic Analysis Journal of the Association for Information Science and Technology. ,vol. 41, pp. 391- 407 ,(1990) , 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Magnus Sahlgren, Automatic bilingual lexicon acquisition using random indexing of aligned bilingual data language resources and evaluation. ,(2004)