作者: Pio Nardiello , Fabrizio Sebastiani , Alessandro Sperduti
关键词: Text categorization 、 Discretization 、 Weighting 、 Boosting (machine learning) 、 Artificial intelligence 、 Pattern recognition 、 Computer science 、 AdaBoost 、 Machine learning 、 Categorization 、 Boosting methods for object categorization 、 Entropy (information theory)
摘要: We focus on two recently proposed algorithms in the family of "boosting"-based learners for automated text classification, ADABOOST. MH and ADABOOST.MHKR. While former is a realization well-known ADABOOST algorithm specifically aimed at multilabel categorization, latter generalization based idea learning committee classifier sub-committees. Both have been among best performers categorization experiments so far. A problem use both that they require documents to be represented by binary vectors, indicating presence or absence terms document. As consequence, these cannot take full advantage "weighted" representations (consisting vectors continuous attributes) are customary information retrieval tasks, provide much more significant rendition document's content than representations. In this paper we address exploiting potential weighted context ADABOOST-like discretizing attributes through application entropy-based discretization methods. present experimental results Reuters-21578 collection, showing version with discretized outperforms traditional representations.