Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

作者: Hans-Peter Kriegel , Peer Kröger , Arthur Zimek

DOI: 10.14778/1454159.1454223

关键词:

摘要: As a prolific research area in data mining, subspace clustering and related problems induced vast amount of proposed solutions. However, many publications compare new proposition -- if at all with one or two competitors even so called "naive" ad hoc solution but fail to clarify the exact problem definition. consequence, solutions are thoroughly compared experimentally, it will often remain unclear whether both tackle same or, they do, agree certain tacit assumptions how such may influence outcome an algorithm. In this tutorial, we try (i) different definitions general, (ii) specific difficulties encountered field research, (iii) varying assumptions, heuristics, intuitions forming basis approaches, (iv) several prominent essentially problems.

参考文章(58)
Elke Achtert, Arthur Zimek, Peer Kröger, Jörn David, Christian Böhm, Robust Clustering in Arbitrarily Oriented Subspaces siam international conference on data mining. pp. 763- 774 ,(2008)
Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger, Ina Müller-Gorman, Arthur Zimek, Detection and Visualization of Subspace Cluster Hierarchies Advances in Databases: Concepts, Systems and Applications. pp. 152- 163 ,(2007) , 10.1007/978-3-540-71703-4_15
Agma J. M. Traina, Christos Faloutsos, Elaine Parros Machado, How to Use the Fractal Dimension to Find Correlations between Attributes ,(2002)
Sanjay Goil, Harsha S. Nagesh, Alok N. Choudhary, Adaptive Grids for Clustering Massive Data Sets. siam international conference on data mining. pp. 1- 17 ,(2001)
Inderjit S. Dhillon, Yuqiang Guan, Hyuk Cho, Suvrit Sra, Minimum sum-squared residue co-clustering of gene expression data siam international conference on data mining. pp. 114- 125 ,(2004)
Elke Achtert, Hans-Peter Kriegel, Arthur Zimek, ELKI: A Software System for Evaluation of Subspace Clustering Algorithms statistical and scientific database management. pp. 580- 585 ,(2008) , 10.1007/978-3-540-69497-7_41
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Bernhard Ganter, Rudolf Wille, C. Franzke, Formal Concept Analysis: Mathematical Foundations ,(1998)
Gustavo Stolovitzky, Andrea Califano, Yuhai Tu, Analysis of Gene Expression Microarrays for Phenotype Classification intelligent systems in molecular biology. ,vol. 8, pp. 75- 85 ,(2000)