作者: Marcos Antonio Mouriño García , Roberto Pérez Rodríguez , Luis Anido Rifón
DOI: 10.1016/J.ARTMED.2018.04.007
关键词: Artificial intelligence 、 Computer science 、 Multilingualism 、 German 、 Icelandic 、 Natural language processing 、 Interlanguage 、 Encyclopedias as Topic 、 Romanian 、 Machine translation 、 Classifier (UML)
摘要: Abstract This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on interlanguage links convert between languages. The performance compared based machine translation, two classifiers MetaMap. To perform experiments, we created multilingual corpus. first one, Multi-Lingual UVigoMED (ML-UVigoMED) composed 23,647 about topics English, German, French, Spanish, Italian, Galician, Romanian, Icelandic. second English-French-Spanish-German (EFSG-UVigoMED) 19,210 abstract extracted from MEDLINE German. approach proposed superior state-of-the art benchmark. conclude leveraging great advantage tasks classification