作者: Naftali Tishby , Noam Slonim
DOI:
关键词:
摘要: The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant the values another variable. Several previous works already suggested applying this document clustering, gene expression data analysis, spectral analysis and more. In work we present a novel implementation supervised text classification. Specifically, apply bottleneck to find word-clusters preserve about categories use these clusters as Previous [1] used similar clustering procedure show can significantly reduce feature space dimensionality, with only minor change in classification accuracy. reproduce results go further when training sample is small word yield significant improvement accuracy (up 18%) over performance using words directly.