Czech Text Document Corpus v 2.0

作者： Ladislav Lenc , Pavel Král

DOI:

关键词: Layer (object-oriented design) 、 Czech 、 Information retrieval 、 Text document 、 Document classification 、 Agency (sociology) 、 Computer science 、 Order (business)

摘要: This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed the provided by News Agency and freely available research purposes at this http URL corpus was created order to facilitate straightforward comparison approaches on data. particularly dedicated evaluation multi-label approaches, because one usually labelled with more than label. Besides information about classes, also annotated morphological layer. further shows results selected state-of-the-art methods offer possibility an easy these approaches.

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(4)

David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)

Tomáš Brychcín, Pavel Král, Novel Unsupervised Features for Czech Multi-label Document Classification mexican international conference on artificial intelligence. pp. 70- 79 ,(2014) , 10.1007/978-3-319-13647-9_8

Milan Straka, Jana Straková, Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 88-99. pp. 88- 99 ,(2017) , 10.18653/V1/K17-3009

Ladislav Lenc, Pavel Král, Deep Neural Networks for Czech Multi-label Document Classification arXiv: Computation and Language. ,(2017) , 10.1007/978-3-319-75487-1_36

Czech Text Document Corpus v 2.0

来源期刊

我的账户

Czech Text Document Corpus v 2.0

来源期刊

相似文章 0

我的账户