作者: Shanel Reyes-Palacios , Edwin Aldana-Bobadilla , Ivan Lopez-Arevalo , Alejandro Molina-Villegas
DOI:
关键词:
摘要: Modern Text Mining techniques seek for extract information in useful formats such as georeferences in digital documents. Automatic recognition of location names in texts is usually solved through Named Entity Recognition (NER) systems. Most current NER are based on Machine Learning and have very high accuracy in detection of location entities in digital documents, especially if the texts are in English due to the lack of available annotated corpora in other languages. However, recent studies are dealing with the challenge of taking the output labels of a NER system and then gather, from a gazetteer, their exact unambiguous geographical coordinates. This is challenging mainly because toponyms use to be very ambiguous, so research in disambiguation methods is relevant. In this paper we describe some of the main ideas towards a method to associate locations with geographical data removing possible confusion between entities with the same name. So far, we have already accomplished Geographic NER and coordinates retrieval but the main research is still in course. We largely discuss about the state of the art around Geoparsing; we explain how our Geographic Entity Recognition module works and finally we describe the research proposal focusing in ambiguity detection.