作者: Pavel Král
DOI: 10.1007/978-3-642-54903-8_35
关键词: Computational linguistics 、 Word error rate 、 Feature selection 、 Newspaper 、 Natural language processing 、 Artificial intelligence 、 Word (computer architecture) 、 Czech 、 Document classification 、 Computer science 、 Feature vector 、 Task (project management)
摘要: This paper is focused on automatic document classification. The results will be used to develop a real application for the Czech News Agency. main goal of this work propose new features based Named Entities NEs task. Five different approaches employ are suggested and evaluated newspaper corpus. We show that these do not improve significantly score over baseline word-based features. classification error rate improvement only about 0.42% when best approach used.