New Slovene Corpora within the Communication In Slovene Project

DOI:

关键词:

摘要: The paper presents three publicly available corpora of contemporary Slovene: a) a monolingual dynamic corpus of written language Gigafida (1 billion words); b) a balanced subcorpus of written language KRES (100 million words); c) a reference corpus of spoken Slovene GOS (1 million words). The spoken and written data has been compiled since 2008. The billion-word corpus has already been compiled. The corpus is lemmatized and morpho-syntactically tagged, as well as partly syntactically annotated. All sorts of language features may be retrieved from it – syntactic and semantic information, as well as phraseology. Moreover, the corpus constitutes a basis for a lexical database and a modern corpus-based grammar, both of which are being developed within the project. The larger corpus is the foundation of a balanced subcorpus of the written language. The paper compares the main features of the two …

ceeol.com 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(0)

New Slovene Corpora within the Communication In Slovene Project

来源期刊

我的账户

New Slovene Corpora within the Communication In Slovene Project

来源期刊

相似文章 0

我的账户