作者: Supphachai Thaicharoen , Krzysztof J. Cios , Tom Altman
DOI:
关键词: Document clustering 、 Representation (mathematics) 、 Vector space model 、 Computer science 、 Term (time) 、 Data mining 、 TREC Genomics 、 Document classification 、 Binary classification 、 Support vector machine
摘要: Term signal is an existing text representation that depicts a term as vector of frequencies occurrences in number user-defined partitions document. Although augments the traditional space model with patterns occurrences, its document division not coherent actual logical structure In this paper, we propose novel model, termed Structure-Based Document Model Discrete Wavelet Transforms (SDMDWT), exploits structural information documents and mathematical transforms for representation. The proposed SDMDWT enhances concept by additionally taking into consideration document's during division. We evaluated on two different domains standard data sets, WebKB 4-Universities TREC Genomics 2005, using Support Vector Machines binary classification. experimental results show our demonstrates promising improvements classification performances over models.