Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

作者: Vipin Balachandran , Deepak P , Deepak Khemani

DOI: 10.1007/S10115-011-0446-9

关键词:

摘要: Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability clusters outline the problem generating clusterings with interpretable reconfigurable cluster models. develop two toward outlined goal building They generate associated rules composed conditions on word occurrences or nonoccurrences. The proposed approaches vary in complexity format rules; RGC employs disjunctions conjunctions rule generation whereas RGC-D simple signifying presence various words. In both cases, each is comprised precisely set satisfy corresponding rule. Rules latter kind easy interpret, former leads more accurate clustering. show our outperform unsupervised decision tree approach for rule-generating also an we provide models general clusterings, significant margins. empirically purity f-measure losses achieve can be as little 3 5%, respectively using presented herein.

参考文章(46)
Martin Ester, Byron J. Gao, Cluster Description Formats, Problems and Algorithms. siam international conference on data mining. pp. 464- 468 ,(2006)
Michael Steinbach, Levent Ertöz, Vipin Kumar, The Challenges of Clustering High Dimensional Data Springer, Berlin, Heidelberg. pp. 273- 309 ,(2004) , 10.1007/978-3-662-08968-2_16
Sholom M. Weiss, Nitin Indurkhya, Lightweight Rule Induction international conference on machine learning. pp. 1135- 1142 ,(2000)
Laks V.S. Lakshmanan, Raymond T. Ng, Christine Xing Wang, Xiaodong Zhou, Theodore J. Johnson, The generalized MDL approach for summarization very large data bases. pp. 766- 777 ,(2002) , 10.1016/B978-155860869-6/50073-1
Oren Etzioni, Oren Zamir, Richard M. Karp, Omid Madani, Fast and intuitive clustering of web documents knowledge discovery and data mining. pp. 287- 290 ,(1997)
Ryszard S. Michalski, Robert E. Stepp, Learning from Observation: Conceptual Clustering Machine Learning. pp. 331- 363 ,(1983) , 10.1007/978-3-662-12405-5_11
Derek Greene, Pádraig Cunningham, Producing Accurate Interpretable Clusters from High-Dimensional Data Knowledge Discovery in Databases: PKDD 2005. pp. 486- 494 ,(2005) , 10.1007/11564126_49
William W. Cohen, Yoram Singer, A simple, fast, and effective rule learner national conference on artificial intelligence. pp. 335- 342 ,(1999)
Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)
Raghu Krishnapuram, Krishna Kummamuru, Automatic taxonomy generation: issues and possibilities Lecture Notes in Computer Science. pp. 52- 63 ,(2003) , 10.1007/3-540-44967-1_5