Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval

作者: Daniel Ferrés , Horacio Rodríguez

DOI: 10.1007/978-3-319-23826-5_30

关键词: Artificial intelligenceLemmatisationComputer scienceQuery expansionTerm (time)Natural language processingRe rankingRanking (information retrieval)Information retrievalNamed-entity recognitionDeep linguistic processingWeighting

摘要: This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, Query Expansion techniques to improve Information Retrieval effectiveness. Re-Ranking is performed with Gazetteers conservative Toponym Disambiguation that boost ranking geographically relevant documents retrieved by standard state-of-the-art algorithms. Processing in two ways: 1 Part-of-Speech tagging Named Entity Recognition Classification are applied analyze text collections topics detect toponyms, 2 Stemming Porter's algorithm Lemmatization also combination default stopwords filtering. The methods tested Bose-Einstein Bo1 Kullback-Leibler term weighting models. experiments have been English Monolingual test GeoCLEF evaluations from years 2005, 2006, 2007, 2008 using TF-IDF, BM25, InL2 algorithms over unprocessed texts as baselines. each collection 25 per evaluation separately fusion all these 100 topics. results evaluating lemmatization, stemming, both, show processes Mean Average Precision MAP RPrecision effectiveness measures statistical significance baselines most them. best obtained following techniques: Stemming, Expansion. Some configurations improved official at 2007.

参考文章(16)
José M. Perea-Ortega, Miguel A. García-Cumbreras, L. Alfonso Ureña-López, Manuel García-Vega, Geo-textual relevance ranking to improve a text-based retrieval for geographic queries international conference natural language processing. pp. 278- 281 ,(2011) , 10.1007/978-3-642-22327-3_38
Davide Buscaldi, Paolo Rosso, Explicit Query Diversification for Geographical Information Retrieval european conference on information retrieval. pp. 73- 80 ,(2011)
Rui Wang, Günter Neumann, Ontology-based query construction for GeoCLEF cross language evaluation forum. pp. 880- 884 ,(2008) , 10.1007/978-3-642-04447-2_116
Christa Womser-Hacker, Thomas Mandl, Diana Santos, Giorgio Maria Di Nunzio, Nicola Ferro, Fredric C. Gey, Mark Sanderson, An Evaluation Resource for Geographic Information Retrieval language resources and evaluation. ,(2008)
Linda L. Hill, Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints european conference on research and advanced technology for digital libraries. pp. 280- 290 ,(2000) , 10.1007/3-540-45268-0_26
Giambattista Amati, Probability models for information retrieval based on divergence from randomness British Library, British Thesis Service. ,(2003)
Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Douglas Johnson, Terrier information retrieval platform european conference on information retrieval. pp. 517- 519 ,(2005) , 10.1007/978-3-540-31865-1_37
Ray R. Larson, Fredric C. Gey, Vivien Petras, Berkeley at GeoCLEF: logistic regression and fusion for geographic information retrieval cross-language evaluation forum. pp. 963- 976 ,(2005) , 10.1007/11878773_108
Mark D. Smucker, James Allan, Ben Carterette, A comparison of statistical significance tests for information retrieval evaluation conference on information and knowledge management. pp. 623- 632 ,(2007) , 10.1145/1321440.1321528
Christopher B. Jones, Ross S. Purves, Geographical information retrieval International Journal of Geographical Information Science. ,vol. 22, pp. 219- 228 ,(2008) , 10.1080/13658810701626343