作者: Ralf Steinberger , Bruno Pouliquen , Johan Hagman
关键词:
摘要: We are presenting an approach to calculating the semantic similarity of documents written in same or different languages. The calculation is achieved by representing document contents a language-independent way, using descriptor terms multilingual thesaurus EUROVOC, and then distance between these representations. While EUROVOC carefully handcrafted knowledge structure, our procedure uses statistical techniques. method was applied collection 5990 English Spanish parallel texts evaluated measuring number times translation given identified as most similar document. good results showed feasibility usefulness approach.