Named Entities as New Features for Czech Document Classification

作者： Pavel Král

关键词: Computational linguistics 、 Word error rate 、 Feature selection 、 Newspaper 、 Natural language processing 、 Artificial intelligence 、 Word (computer architecture) 、 Czech 、 Document classification 、 Computer science 、 Feature vector 、 Task (project management)

摘要: This paper is focused on automatic document classification. The results will be used to develop a real application for the Czech News Agency. main goal of this work propose new features based Named Entities NEs task. Five different approaches employ are suggested and evaluated newspaper corpus. We show that these do not improve significantly score over baseline word-based features. classification error rate improvement only about 0.42% when best approach used.

researchgate.net PDF 下载加速

core.ac.uk UNKNOWN 下载加速

sci-hub.st HTML 下载加速

参考文章(36)

Ziqi Zhang, Trevor Cohn, Fabio Ciravegna, Topic-Oriented words as features for named entity recognition international conference on computational linguistics. ,vol. 7816, pp. 304- 316 ,(2013) , 10.1007/978-3-642-37247-6_25

Michal Konkol, Miloslav Konopík, CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research text speech and dialogue. ,vol. 8082, pp. 153- 160 ,(2013) , 10.1007/978-3-642-40585-3_20

Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86

Michal Hrala, Pavel Král, Multi-label Document Classification in Czech text speech and dialogue. pp. 343- 351 ,(2013) , 10.1007/978-3-642-40585-3_44

Anne Abeillé, Treebanks Building and Using Parsed Corpora ,(2014)

Luigi Galavotti, Fabrizio Sebastiani, Maria Simi, Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization european conference on research and advanced technology for digital libraries. ,vol. 1923, pp. 59- 68 ,(2000) , 10.1007/3-540-45268-0_6

Jana Novovičová, Petr Somol, Michal Haindl, Pavel Pudil, Conditional mutual information based feature selection for classification task iberoamerican congress on pattern recognition. pp. 417- 426 ,(2007) , 10.1007/978-3-540-76725-1_44

Alexander F. Gelbukh, Computational Linguistics and Intelligent Text Processing ,(2001)

Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63

10.

Alessandro Moschitti, Roberto Basili, Complex Linguistic Features for Text Classification: A Comprehensive Study Lecture Notes in Computer Science. pp. 181- 196 ,(2004) , 10.1007/978-3-540-24752-4_14

Named Entities as New Features for Czech Document Classification

来源期刊

我的账户

Named Entities as New Features for Czech Document Classification

来源期刊

相似文章 5

Confidence Measure for Czech Document Classification

SAPKOS: Experimental Czech Multi-label Document Classification and Analysis System

A recent overview of the state-of-the-art elements of text classification

Children’s Story Classification in Indian Languages Using Linguistic and Keyword-based Features

Automatic Information Extraction from Scanned Documents.

我的账户