Written corpus ccKres 1.0

作者: Peter Holozan , Miha Grčar , Tomaž Erjavec , Nataša Logar , Simon Krek

DOI:

关键词: Encoding (semiotics)NewspaperArtificial intelligenceNatural language processingCorpus linguisticsXMLComputer science

摘要: Corpus ccKres consists of 9,376 documents, each containing information about the source (e.g. newspapers, magazines), year publication, text type (fiction, newspaper), title and author if they are known. The corpus is POS-tagged lemmatised, encoded in XML TEI format (Text Encoding Initiative P5). contains approximately 9% Kres corpus, a balanced Slovene: http://eng.slovenscina.eu/korpusi/kres.

参考文章(0)