Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

作者: Vipin Balachandran , Deepak P , Deepak Khemani

DOI: 10.1145/1645953.1646227

关键词:

摘要: Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability clusters outline the problem generating clusterings with interpretable reconfigurable cluster models. develop a algorithm toward outlined goal building models; it works rules disjunctions conditions on frequencies words, decide membership document cluster. Each is comprised precisely set satisfy corresponding rule. show our approach outperforms unsupervised decision tree huge margins. purity f-measure losses achieve as little 5% 3% respectively using approach.

参考文章(49)
Martin Ester, Byron J. Gao, Cluster Description Formats, Problems and Algorithms. siam international conference on data mining. pp. 464- 468 ,(2006)
Michael Steinbach, Levent Ertöz, Vipin Kumar, The Challenges of Clustering High Dimensional Data Springer, Berlin, Heidelberg. pp. 273- 309 ,(2004) , 10.1007/978-3-662-08968-2_16
Sholom M. Weiss, Nitin Indurkhya, Lightweight Rule Induction international conference on machine learning. pp. 1135- 1142 ,(2000)
Laks V.S. Lakshmanan, Raymond T. Ng, Christine Xing Wang, Xiaodong Zhou, Theodore J. Johnson, The generalized MDL approach for summarization very large data bases. pp. 766- 777 ,(2002) , 10.1016/B978-155860869-6/50073-1
Oren Etzioni, Oren Zamir, Richard M. Karp, Omid Madani, Fast and intuitive clustering of web documents knowledge discovery and data mining. pp. 287- 290 ,(1997)
Ryszard S. Michalski, Robert E. Stepp, Learning from Observation: Conceptual Clustering Machine Learning. pp. 331- 363 ,(1983) , 10.1007/978-3-662-12405-5_11
Derek Greene, Pádraig Cunningham, Producing Accurate Interpretable Clusters from High-Dimensional Data Knowledge Discovery in Databases: PKDD 2005. pp. 486- 494 ,(2005) , 10.1007/11564126_49
William W. Cohen, Yoram Singer, A simple, fast, and effective rule learner national conference on artificial intelligence. pp. 335- 342 ,(1999)
Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)
Raghu Krishnapuram, Krishna Kummamuru, Automatic taxonomy generation: issues and possibilities Lecture Notes in Computer Science. pp. 52- 63 ,(2003) , 10.1007/3-540-44967-1_5