作者: Miloslav Konopík , , Ondřej Pražák , David Steinberger
DOI: 10.26615/978-954-452-049-6_053
关键词: Word (computer architecture) 、 Text corpus 、 Czech 、 Agreement 、 Artificial intelligence 、 Semantic similarity 、 Natural language processing 、 Annotation 、 Spearman's rank correlation coefficient 、 Computer science
摘要: This paper introduces a Czech dataset for semantic similarity and relatedness. The contains word pairs with hand annotated scores that indicate the relatedness of words. 953 compiled from 9 different sources. It words their contexts taken real text corpora including extra examples when are ambiguous. is by 5 independent annotators. average Spearman correlation coefficient annotation agreement r = 0.81. We provide reference evaluation experiments several methods computing