Cluster Description Formats, Problems and Algorithms.

DOI:

关键词: Algorithm 、 Minimum description length 、 Interpretability 、 Cluster analysis 、 Computer science 、 Cluster (physics) 、 Data mining 、 Heuristic (computer science)

摘要: Clustering is one of the major data mining tasks. So far, database and literature lacks systematic study cluster descriptions, which are essential to provide user with understandable knowledge clusters support further interactive exploration. In this paper, we introduce novel description formats leading more descriptive power. We define two alternative problems generating Minimum Description Length Maximum Accuracy, providing different trade-offs between interpretability accuracy. also present heuristic algorithms for both problems, together their empirical evaluation comparison state-of-the-art algorithms.

uni-trier.de 本地加速

sfu.ca 本地加速

siam.org PDF 下载加速

参考文章(5)

Laks V.S. Lakshmanan, Raymond T. Ng, Christine Xing Wang, Xiaodong Zhou, Theodore J. Johnson, The generalized MDL approach for summarization very large data bases. pp. 766- 777 ,(2002) , 10.1016/B978-155860869-6/50073-1

Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314

Ken Q. Pu, Alberto O. Mendelzon, Concise descriptions of subsets of structured sets symposium on principles of database systems. ,vol. 30, pp. 123- 133 ,(2003) , 10.1145/1061318.1061324

Tomasz Imielinski, Heikki Mannila, A database perspective on knowledge discovery Communications of the ACM. ,vol. 39, pp. 58- 64 ,(1996) , 10.1145/240455.240472

Cluster Description Formats, Problems and Algorithms.

来源期刊

我的账户

Cluster Description Formats, Problems and Algorithms.

来源期刊

相似文章 7

Hyper-rectangle-based discriminative data generalization and applications in data mining

Right of Inference: Nearest Rectangle Learning Revisited

Selecting labels for news document clusters

Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description

A Cluster Description Method for High Dimensional Data Clustering with Categorical Variables

我的账户