Interactive cleaning for automatic document clustering and categorization

作者: Caroline Privault , Jean-Michel Renders , Ludovic Menuge

DOI:

关键词: Document clusteringCategorizationCluster analysisAmbiguityComputer scienceSimilarity (network science)User inputOutlierClass (biology)Information retrieval

摘要: Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures computed for the indicative of how well each document fits into model. identified user based on outlier and selected criterion. Ambiguity number classes which has similarity under If is annotated label class, possible corrective class if higher than class. The clustering categorizing repeated adjusted received input an updated ambiguity also calculated at runtime new classified using

参考文章(28)
Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, None, Time-Sensitive Dirichlet Process Mixture Models Carnegie Mellon University: School of Computer Science. ,(2005)
Cyril Goutte, Eric Gaussier, Incremental training for probabilistic categorizer ,(2005)
Jean-Marc Andreoli, Guillaume Bouchard, Probabilistic latent clustering of device usage intelligent data analysis. pp. 1- 11 ,(2005) , 10.1007/11552253_1
Vladimir Vapnik, Isabelle Guyon, Nada Matic, Discovering informative patterns and data cleaning knowledge discovery and data mining. pp. 181- 203 ,(1996)