作者: Ladislav Lenc , Pavel Král
DOI:
关键词: Layer (object-oriented design) 、 Czech 、 Information retrieval 、 Text document 、 Document classification 、 Agency (sociology) 、 Computer science 、 Order (business)
摘要: This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed the provided by News Agency and freely available research purposes at this http URL corpus was created order to facilitate straightforward comparison approaches on data. particularly dedicated evaluation multi-label approaches, because one usually labelled with more than label. Besides information about classes, also annotated morphological layer. further shows results selected state-of-the-art methods offer possibility an easy these approaches.