Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description

作者: Byron Gao , Martin Ester

DOI: 10.1109/ICDM.2006.163

关键词: Discriminative modelCluster analysisData descriptionCluster (physics)Knowledge extractionMachine learningRectangleArtificial intelligenceData miningComputer scienceInterpretabilityHeuristic

摘要: The ultimate goal of data mining is to extract knowledge from massive data. Knowledge ideally represented as human-comprehensible patterns which end-users can gain intuitions and insights. Yet not all methods produce such readily understandable knowledge, e.g., most clustering algorithms output sets points clusters. In this paper, we perform a systematic study cluster description that generates interpretable We introduce analyze novel formats leading more expressive power, motivate define problems specifying different trade-offs between interpretability accuracy. also present effective heuristic together with their empirical evaluations.

参考文章(17)
Martin Ester, Byron J. Gao, Cluster Description Formats, Problems and Algorithms. siam international conference on data mining. pp. 464- 468 ,(2006)
Laks V.S. Lakshmanan, Raymond T. Ng, Christine Xing Wang, Xiaodong Zhou, Theodore J. Johnson, The generalized MDL approach for summarization very large data bases. pp. 766- 777 ,(2002) , 10.1016/B978-155860869-6/50073-1
Jonathan Eckstein, Peter L. Hammer, Ying Liu, Mikhail Nediak, Bruno Simeone, The Maximum Box Problem and its Application to Data Analysis Computational Optimization and Applications. ,vol. 23, pp. 285- 298 ,(2002) , 10.1023/A:1020546910706
Byron J. Gao, Martin Ester, Right of Inference: Nearest Rectangle Learning Revisited Lecture Notes in Computer Science. pp. 638- 645 ,(2006) , 10.1007/11871842_62
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
V. S. Anil Kumar, H. Ramesh, Covering Rectilinear Polygons with Axis-Parallel Rectangles SIAM Journal on Computing. ,vol. 32, pp. 1509- 1541 ,(2003) , 10.1137/S0097539799358835
Ken Q. Pu, Alberto O. Mendelzon, Concise descriptions of subsets of structured sets symposium on principles of database systems. ,vol. 30, pp. 123- 133 ,(2003) , 10.1145/1061318.1061324