Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

作者： Ralf Steinberger , Bruno Pouliquen , Johan Hagman

关键词:

摘要: We are presenting an approach to calculating the semantic similarity of documents written in same or different languages. The calculation is achieved by representing document contents a language-independent way, using descriptor terms multilingual thesaurus EUROVOC, and then distance between these representations. While EUROVOC carefully handcrafted knowledge structure, our procedure uses statistical techniques. method was applied collection 5990 English Spanish parallel texts evaluated measuring number times translation given identified as most similar document. good results showed feasibility usefulness approach.

参考文章(5)

Johan Hagman, Domenico Perrotta, Ralf Steinberger, Aristide Varfis, Document Classification and Visualisation to Support the Investigation of Suspected Fraud ,(2001)

Noah A. Smith, Detection of Translational Equivalence ,(2001)

Ralf Steinberger, Cross-lingual keyword assignment Procesamiento Del Lenguaje Natural. ,vol. 27, pp. 273- 280 ,(2001)

Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley Longman Publishing Co., Inc.. ,(1989)

Philip Resnik, Mining the Web for Bilingual Text meeting of the association for computational linguistics. pp. 527- 534 ,(1999) , 10.3115/1034678.1034757

Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

来源期刊

我的账户

Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

来源期刊

相似文章 10

我的账户