作者: Maricela Bravo , Luis Fernando Hoyos Reyes , Domingo Rodríguez Benavides , Leonardo D Sánchez-Martínez
DOI:
关键词:
摘要: There exist multiple online collections and data bases of scientific articles publicly available, to take full advantage of these resources, it is necessary to process, arrange and correlate texts with respect to a classification or ontology. To achieve an efficient organization and a more relevant correlation between texts, it is necessary to use a similarity measure for short texts. However, determining the best method to calculate the similarity between texts is an arduous task, since there are many similarity measures reported in literature. Additionally, the collection of texts to which the similarity measures are applied should be considered; while some measures are useful for some types of information sources, they fail when the collection of data changes. Therefore, it is necessary to count with a method to evaluate the performance of similarity measures from a statistical perspective and in terms of the accuracy achieved by …