Cluster Description Formats, Problems and Algorithms.

作者: Martin Ester , Byron J. Gao

DOI:

关键词: AlgorithmMinimum description lengthInterpretabilityCluster analysisComputer scienceCluster (physics)Data miningHeuristic (computer science)

摘要: Clustering is one of the major data mining tasks. So far, database and literature lacks systematic study cluster descriptions, which are essential to provide user with understandable knowledge clusters support further interactive exploration. In this paper, we introduce novel description formats leading more descriptive power. We define two alternative problems generating Minimum Description Length Maximum Accuracy, providing different trade-offs between interpretability accuracy. also present heuristic algorithms for both problems, together their empirical evaluation comparison state-of-the-art algorithms.

参考文章(5)
Laks V.S. Lakshmanan, Raymond T. Ng, Christine Xing Wang, Xiaodong Zhou, Theodore J. Johnson, The generalized MDL approach for summarization very large data bases. pp. 766- 777 ,(2002) , 10.1016/B978-155860869-6/50073-1
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
Ken Q. Pu, Alberto O. Mendelzon, Concise descriptions of subsets of structured sets symposium on principles of database systems. ,vol. 30, pp. 123- 133 ,(2003) , 10.1145/1061318.1061324
Tomasz Imielinski, Heikki Mannila, A database perspective on knowledge discovery Communications of the ACM. ,vol. 39, pp. 58- 64 ,(1996) , 10.1145/240455.240472