Named Entities as New Features for Czech Document Classification

作者: Pavel Král

DOI: 10.1007/978-3-642-54903-8_35

关键词: Computational linguisticsWord error rateFeature selectionNewspaperNatural language processingArtificial intelligenceWord (computer architecture)CzechDocument classificationComputer scienceFeature vectorTask (project management)

摘要: This paper is focused on automatic document classification. The results will be used to develop a real application for the Czech News Agency. main goal of this work propose new features based Named Entities NEs task. Five different approaches employ are suggested and evaluated newspaper corpus. We show that these do not improve significantly score over baseline word-based features. classification error rate improvement only about 0.42% when best approach used.

参考文章(36)
Ziqi Zhang, Trevor Cohn, Fabio Ciravegna, Topic-Oriented words as features for named entity recognition international conference on computational linguistics. ,vol. 7816, pp. 304- 316 ,(2013) , 10.1007/978-3-642-37247-6_25
Michal Konkol, Miloslav Konopík, CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research text speech and dialogue. ,vol. 8082, pp. 153- 160 ,(2013) , 10.1007/978-3-642-40585-3_20
Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86
Michal Hrala, Pavel Král, Multi-label Document Classification in Czech text speech and dialogue. pp. 343- 351 ,(2013) , 10.1007/978-3-642-40585-3_44
Luigi Galavotti, Fabrizio Sebastiani, Maria Simi, Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization european conference on research and advanced technology for digital libraries. ,vol. 1923, pp. 59- 68 ,(2000) , 10.1007/3-540-45268-0_6
Jana Novovičová, Petr Somol, Michal Haindl, Pavel Pudil, Conditional mutual information based feature selection for classification task iberoamerican congress on pattern recognition. pp. 417- 426 ,(2007) , 10.1007/978-3-540-76725-1_44
Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63
Alessandro Moschitti, Roberto Basili, Complex Linguistic Features for Text Classification: A Comprehensive Study Lecture Notes in Computer Science. pp. 181- 196 ,(2004) , 10.1007/978-3-540-24752-4_14