作者: Nikola Ljubeši'c , Żeljko Agi'c
DOI:
关键词:
摘要: We present SETimes.HR ― the first linguistically annotated corpus of Croatian that is freely available for all purposes. The built on top SETimes parallel nine Southeast European languages and English. It manually lemmas, morphosyntactic tags, named entities dependency syntax. couple with domain-sensitive test sets Serbian to support direct model transfer evaluation between these closely related languages. build evaluate statistical models lemmatization, tagging, entity recognition parsing sets, providing state art in tasks. make resources presented paper under a very permissive licensing scheme.