Confidence Measure for Czech Document Classification

作者: Pavel Král , Ladislav Lenc

DOI: 10.1007/978-3-319-18117-2_39

关键词: Artificial intelligenceComputer scienceClassifier (UML)Machine learningDocument classificationCzechLatent Dirichlet allocation

摘要: This paper deals with automatic document classification in the context of a real application for Czech News Agency (CTK). The accuracy our classifier is high, however it still important to improve results. main goal this thus propose novel confidence measure approaches order detect and remove incorrectly classified samples. Two proposed methods are based on posterior class probability third one supervised approach which uses another determine if result correct. evaluated newspaper corpus. We experimentally show that beneficial integrate into task because they significantly accuracy.

参考文章(34)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Tomáš Brychcín, Pavel Král, Novel Unsupervised Features for Czech Multi-label Document Classification mexican international conference on artificial intelligence. pp. 70- 79 ,(2014) , 10.1007/978-3-319-13647-9_8
Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86
Harris Papadopoulos, A Cross-Conformal Predictor for Multi-label Classification artificial intelligence applications and innovations. pp. 241- 250 ,(2014) , 10.1007/978-3-662-44722-2_26
Michal Hrala, Pavel Král, Multi-label Document Classification in Czech text speech and dialogue. pp. 343- 351 ,(2013) , 10.1007/978-3-642-40585-3_44
Michal Konkol, Brainy: A Machine Learning Library Artificial Intelligence and Soft Computing. pp. 490- 499 ,(2014) , 10.1007/978-3-319-07176-3_43
R. Chandrasekar, B. Srinivas, Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging RIAO '97 Computer-Assisted Information Searching on Internet. pp. 531- 545 ,(1997)
Kostas Proedrou, Ilia Nouretdinov, Volodya Vovk, Alex Gammerman, Transductive Confidence Machines for Pattern Recognition Lecture Notes in Computer Science. pp. 381- 390 ,(2002) , 10.1007/3-540-36755-1_32
Pavel Král, Named Entities as New Features for Czech Document Classification Computational Linguistics and Intelligent Text Processing. pp. 417- 427 ,(2014) , 10.1007/978-3-642-54903-8_35
Fayin Li, Harry Wechsler, Open world face recognition with credibility and confidence measures Lecture Notes in Computer Science. pp. 462- 469 ,(2003) , 10.1007/3-540-44887-X_55