Geographical Evaluation of Word Embeddings

作者: Michal Nykl , Tomáš Brychcín , Michal Konkol , Tomáš Hercig

DOI:

关键词: Error analysisVisualizationWord (computer architecture)Set (abstract data type)Semantic spaceNatural language processingComputer sciencePosition (vector)Artificial intelligence

摘要: Word embeddings are commonly compared either with human-annotated word similarities or through improvements in natural language processing tasks. We propose a novel principle which compares the information from reality. implement this by comparing geographical positions of cities. Our evaluation linearly transforms semantic space to optimally fit real cities and measures deviation between position given position. A set well-known state-of-the-art results were evaluated. also introduce visualization that helps error analysis.

参考文章(25)
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
Omer Levy, Yoav Goldberg, Ido Dagan, Improving Distributional Similarity with Lessons Learned from Word Embeddings Transactions of the Association for Computational Linguistics. ,vol. 3, pp. 211- 225 ,(2015) , 10.1162/TACL_A_00134
Felix Hill, Roi Reichart, Anna Korhonen, Simlex-999: Evaluating semantic models with genuine similarity estimation Computational Linguistics. ,vol. 41, pp. 665- 695 ,(2015) , 10.1162/COLI_A_00237
Michal Konkol, Tomáš Brychcín, Miloslav Konopík, Latent semantics in Named Entity Recognition Expert Systems With Applications. ,vol. 42, pp. 3470- 3479 ,(2015) , 10.1016/J.ESWA.2014.12.015
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, Eytan Ruppin, Placing search in context Proceedings of the tenth international conference on World Wide Web - WWW '01. pp. 406- 414 ,(2001) , 10.1145/371920.372094
Herbert Rubenstein, John B. Goodenough, Contextual correlates of synonymy Communications of the ACM. ,vol. 8, pp. 627- 633 ,(1965) , 10.1145/365628.365657
Andrew Y. Ng, Christopher Potts, Andrew L. Maas, Dan Huang, Peter T. Pham, Raymond E. Daly, Learning Word Vectors for Sentiment Analysis meeting of the association for computational linguistics. pp. 142- 150 ,(2011)
Richard Socher, Will Y. Zou, Christopher D. Manning, Daniel Cer, Bilingual Word Embeddings for Phrase-Based Machine Translation empirical methods in natural language processing. pp. 1393- 1398 ,(2013)
Thomas K Landauer, Peter W. Foltz, Darrell Laham, An introduction to latent semantic analysis Discourse Processes. ,vol. 25, pp. 259- 284 ,(1998) , 10.1080/01638539809545028
Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa, None, A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches north american chapter of the association for computational linguistics. pp. 19- 27 ,(2009) , 10.3115/1620754.1620758