Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

作者: Ralf Steinberger , Bruno Pouliquen , Johan Hagman

DOI: 10.1007/3-540-45715-1_44

关键词:

摘要: We are presenting an approach to calculating the semantic similarity of documents written in same or different languages. The calculation is achieved by representing document contents a language-independent way, using descriptor terms multilingual thesaurus EUROVOC, and then distance between these representations. While EUROVOC carefully handcrafted knowledge structure, our procedure uses statistical techniques. method was applied collection 5990 English Spanish parallel texts evaluated measuring number times translation given identified as most similar document. good results showed feasibility usefulness approach.

参考文章(5)
Johan Hagman, Domenico Perrotta, Ralf Steinberger, Aristide Varfis, Document Classification and Visualisation to Support the Investigation of Suspected Fraud ,(2001)
Ralf Steinberger, Cross-lingual keyword assignment Procesamiento Del Lenguaje Natural. ,vol. 27, pp. 273- 280 ,(2001)
Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley Longman Publishing Co., Inc.. ,(1989)
Philip Resnik, Mining the Web for Bilingual Text meeting of the association for computational linguistics. pp. 527- 534 ,(1999) , 10.3115/1034678.1034757