Czech Dataset for Semantic Similarity and Relatedness.

作者: Miloslav Konopík , , Ondřej Pražák , David Steinberger

DOI: 10.26615/978-954-452-049-6_053

关键词: Word (computer architecture)Text corpusCzechAgreementArtificial intelligenceSemantic similarityNatural language processingAnnotationSpearman's rank correlation coefficientComputer science

摘要: This paper introduces a Czech dataset for semantic similarity and relatedness. The contains word pairs with hand annotated scores that indicate the relatedness of words. 953 compiled from 9 different sources. It words their contexts taken real text corpora including extra examples when are ambiguous. is by 5 independent annotators. average Spearman correlation coefficient annotation agreement r = 0.81. We provide reference evaluation experiments several methods computing

参考文章(14)
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Kevin Lund, Curt Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence Behavior Research Methods, Instruments, & Computers. ,vol. 28, pp. 203- 208 ,(1996) , 10.3758/BF03204766
Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, Shaul Markovitch, A word at a time Proceedings of the 20th international conference on World wide web - WWW '11. pp. 337- 346 ,(2011) , 10.1145/1963405.1963455
Herbert Rubenstein, John B. Goodenough, Contextual correlates of synonymy Communications of the ACM. ,vol. 8, pp. 627- 633 ,(1965) , 10.1145/365628.365657
George A. Miller, WordNet Communications of the ACM. ,vol. 38, pp. 39- 41 ,(1995) , 10.1145/219717.219748
Torsten Zesch, Iryna Gurevych, Automatically creating datasets for measures of semantic relatedness Proceedings of the Workshop on Linguistic Distances - LD '06. pp. 16- 24 ,(2006) , 10.3115/1641976.1641980
E. Bruni, N. K. Tran, M. Baroni, Multimodal distributional semantics Journal of Artificial Intelligence Research. ,vol. 49, pp. 1- 47 ,(2014) , 10.1613/JAIR.4135
Iryna Gurevych, Using the Structure of a Conceptual Network in Computing Semantic Relatedness Lecture Notes in Computer Science. pp. 767- 778 ,(2005) , 10.1007/11562214_67