作者: Michal Hrala , Pavel Král
DOI: 10.1007/978-3-319-00969-8_86
关键词: Czech 、 Artificial intelligence 、 Feature selection 、 Naive Bayes classifier 、 Computer science 、 Feature vector 、 Support vector machine 、 Document classification 、 Natural language processing 、 Class (biology) 、 Principle of maximum entropy
摘要: This paper deals with one class automatic document classification. Five feature selection methods and three classifiers are evaluated on a Czech corpus in order to build an efficient classification system. Lemmatization POS tagging used for precise representation of the documents. We demonstrated, that tag filtering is very important, while lemmatization plays marginal role classification.We also showed Maximum Entropy Support Vector Machines robust vector size outperform significantly Naive Bayes classifier from view point accuracy. The best accuracy about 90% which enough application News Agency, our commercial partner.