Czech Dataset for Semantic Similarity and Relatedness.

作者： Miloslav Konopík , , Ondřej Pražák , David Steinberger

关键词: Word (computer architecture) 、 Text corpus 、 Czech 、 Agreement 、 Artificial intelligence 、 Semantic similarity 、 Natural language processing 、 Annotation 、 Spearman's rank correlation coefficient 、 Computer science

摘要: This paper introduces a Czech dataset for semantic similarity and relatedness. The contains word pairs with hand annotated scores that indicate the relatedness of words. 953 compiled from 9 different sources. It words their contexts taken real text corpora including extra examples when are ambiguous. is by 5 independent annotators. average Spearman correlation coefficient annotation agreement r = 0.81. We provide reference evaluation experiments several methods computing

参考文章(14)

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134

Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Kevin Lund, Curt Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence Behavior Research Methods, Instruments, & Computers. ,vol. 28, pp. 203- 208 ,(1996) , 10.3758/BF03204766

Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, Shaul Markovitch, A word at a time Proceedings of the 20th international conference on World wide web - WWW '11. pp. 337- 346 ,(2011) , 10.1145/1963405.1963455

Herbert Rubenstein, John B. Goodenough, Contextual correlates of synonymy Communications of the ACM. ,vol. 8, pp. 627- 633 ,(1965) , 10.1145/365628.365657

George A. Miller, WordNet Communications of the ACM. ,vol. 38, pp. 39- 41 ,(1995) , 10.1145/219717.219748

Torsten Zesch, Iryna Gurevych, Automatically creating datasets for measures of semantic relatedness Proceedings of the Workshop on Linguistic Distances - LD '06. pp. 16- 24 ,(2006) , 10.3115/1641976.1641980

E. Bruni, N. K. Tran, M. Baroni, Multimodal distributional semantics Journal of Artificial Intelligence Research. ,vol. 49, pp. 1- 47 ,(2014) , 10.1613/JAIR.4135

10.

Iryna Gurevych, Using the Structure of a Conceptual Network in Computing Semantic Relatedness Lecture Notes in Computer Science. pp. 767- 778 ,(2005) , 10.1007/11562214_67

Czech Dataset for Semantic Similarity and Relatedness.

来源期刊

我的账户

Czech Dataset for Semantic Similarity and Relatedness.

来源期刊

相似文章 2

A survey of semantic relatedness evaluation datasets and procedures

An evaluation of Czech word embeddings

我的账户