作者: Luigi Galavotti , Fabrizio Sebastiani , Maria Simi
关键词:
摘要: We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature (FS) refers to the activity selecting, from set r distinct features (i.e. words) occurring in collection, subset r′ ≪ that are most useful for compactly representing meaning documents. propose a novel FS technique, based on simplified variant X2 statistics. Classifier induction instead problem automatically building by learning documents pre-classified under categories interest. variant, exploitation negative evidence, well-known k-NN method. report results systematic experimentation these methods performed standard REUTERS-21578 benchmark.