作者: Vipin Balachandran , Deepak P , Deepak Khemani
DOI: 10.1007/S10115-011-0446-9
关键词:
摘要: Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability clusters outline the problem generating clusterings with interpretable reconfigurable cluster models. develop two toward outlined goal building They generate associated rules composed conditions on word occurrences or nonoccurrences. The proposed approaches vary in complexity format rules; RGC employs disjunctions conjunctions rule generation whereas RGC-D simple signifying presence various words. In both cases, each is comprised precisely set satisfy corresponding rule. Rules latter kind easy interpret, former leads more accurate clustering. show our outperform unsupervised decision tree approach for rule-generating also an we provide models general clusterings, significant margins. empirically purity f-measure losses achieve can be as little 3 5%, respectively using presented herein.