作者: Jiali Yun , Liping Jing , Jian Yu , Houkuan Huang
DOI: 10.1016/J.ESWA.2011.08.027
关键词:
摘要: Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text more difficult to be analyzed because it contains complicated both syntactic semantic information. In this paper, we propose a two-level representation model (2RM) represent for representing information other Each document, level, represented as term vector where value each component frequency inverse document frequency. The Wikipedia concepts related terms level are used level. Meanwhile, designed multi-layer classification framework (MLCLA) make use 2RM model. MLCLA three classifiers. Among them, two classifiers applied on parallel. outputs these will combined input third classifier, so that final results can obtained. Experimental benchmark sets (20Newsgroups, Reuters-21578 Classic3) have shown proposed plus improves performance by comparing with existing flat models (Term-based VSM, Term Semantic Kernel Model, Concept-based Concept Model Term+Concept VSM) methods.