作者: Daniel Ferrés , Horacio Rodríguez
DOI: 10.1007/978-3-319-23826-5_30
关键词: Artificial intelligence 、 Lemmatisation 、 Computer science 、 Query expansion 、 Term (time) 、 Natural language processing 、 Re ranking 、 Ranking (information retrieval) 、 Information retrieval 、 Named-entity recognition 、 Deep linguistic processing 、 Weighting
摘要: This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, Query Expansion techniques to improve Information Retrieval effectiveness. Re-Ranking is performed with Gazetteers conservative Toponym Disambiguation that boost ranking geographically relevant documents retrieved by standard state-of-the-art algorithms. Processing in two ways: 1 Part-of-Speech tagging Named Entity Recognition Classification are applied analyze text collections topics detect toponyms, 2 Stemming Porter's algorithm Lemmatization also combination default stopwords filtering. The methods tested Bose-Einstein Bo1 Kullback-Leibler term weighting models. experiments have been English Monolingual test GeoCLEF evaluations from years 2005, 2006, 2007, 2008 using TF-IDF, BM25, InL2 algorithms over unprocessed texts as baselines. each collection 25 per evaluation separately fusion all these 100 topics. results evaluating lemmatization, stemming, both, show processes Mean Average Precision MAP RPrecision effectiveness measures statistical significance baselines most them. best obtained following techniques: Stemming, Expansion. Some configurations improved official at 2007.