Gazetteer-independent toponym resolution using geographic word profiles

作者: Jason Baldridge , Grant DeLozier , Loretta London

DOI:

关键词: GeoparsingWord (computer architecture)Language modelEntity linkingComputer scienceSet (abstract data type)Spatial analysisResolution (logic)Information retrievalWeb content

摘要: Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis both historical corpora and present-day news web content. Recent approaches have shifted from rule-based spatial minimization methods machine learned classifiers that use features the text surrounding a toponym. Such been shown be highly effective, but they crucially rely on gazetteers are unable handle unknown place locations. We address this limitation by modeling geographic distributions words over earth's surface: we calculate profile each word based local statistics set geo-referenced language models. These geo-profiles can further refined combining in-domain data with background Wikipedia. Our resolver computes overlap all given span; without using gazetteer, it performs par existing classifiers. When combined achieves state-of-the-art performance for two standard toponym resolution (TR-CoNLL Civil War). Furthermore, dramatically improves recall when toponyms identified named entity recognizers, which often (correctly) find non-standard variants toponyms.

参考文章(20)
Justyna Zander, Pieter J Mosterman, None, Computation for Humanity: Information Technology to Advance Society CRC Press. ,(2013) , 10.1201/9781315216751
David J. Unwin, David O'Sullivan, Geographic Information Analysis ,(2002)
David A. Smith, Gregory Crane, Disambiguating Geographic Names in a Historical Digital Library european conference on research and advanced technology for digital libraries. pp. 127- 136 ,(2001) , 10.1007/3-540-44796-2_12
J. K. Ord, Arthur Getis, Local Spatial Autocorrelation Statistics: Distributional Issues and an Application Geographical Analysis. ,vol. 27, pp. 286- 306 ,(2010) , 10.1111/J.1538-4632.1995.TB00912.X
Mariam Daoud, Jimmy Xiangji Huang, Mining query-driven contexts for geographic and temporal search International Journal of Geographical Information Science. ,vol. 27, pp. 1530- 1549 ,(2013) , 10.1080/13658816.2012.756883
Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, Julian Ball, Use of the Edinburgh geoparser for georeferencing digitized historical collections Philosophical Transactions of the Royal Society A. ,vol. 368, pp. 3875- 3889 ,(2010) , 10.1098/RSTA.2010.0149
Zhiyuan Cheng, James Caverlee, Kyumin Lee, You are where you tweet: a content-based approach to geo-locating twitter users conference on information and knowledge management. pp. 759- 768 ,(2010) , 10.1145/1871437.1871535
João Santos, Ivo Anastácio, Bruno Martins, Using machine learning methods for disambiguating place references in textual documents GeoJournal. ,vol. 80, pp. 375- 392 ,(2015) , 10.1007/S10708-014-9553-Y
Jenny Rose Finkel, Christopher D. Manning, Nested Named Entity Recognition empirical methods in natural language processing. pp. 141- 150 ,(2009) , 10.3115/1699510.1699529