Fast algorithms for projected clustering

作者: Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park

DOI: 10.1145/304181.304188

关键词:

摘要: The clustering problem is well known in the database literature for its numerous applications problems such as customer segmentation, classification and trend analysis. Unfortunately, all algorithms tend to break down high dimensional spaces because of inherent sparsity points. In not dimensions may be relevant a given cluster. One way handling this pick closely correlated find clusters corresponding subspace. Traditional feature selection attempt achieve this. weakness approach that typical data mining different sets points cluster better subsets dimensions. number each cluster-specific subspace also vary. Hence, it impossible single small subset clusters. We therefore discuss generalization problem, referred projected which selected are specific themselves. develop an algorithmic framework solving test performance on synthetic data.

参考文章(26)
Ron Kohavi, Dan Sommerfield, Feature subset selection using the wrapper method: overfltting and dynamic search space topology knowledge discovery and data mining. pp. 192- 197 ,(1995)
R. C. T. Lee, Clustering Analysis and Its Applications Springer, Boston, MA. pp. 169- 292 ,(1981) , 10.1007/978-1-4613-9883-7_4
Finding Groups in Data John Wiley & Sons, Inc.. ,(1990) , 10.1002/9780470316801
Richard Dubes, A.K. Jain, Clustering Methodologies in Exploratory Data Analysis Advances in Computers. ,vol. 19, pp. 113- 228 ,(1980) , 10.1016/S0065-2458(08)60034-0
Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)
Martin Ester, Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases. pp. 67- 82 ,(1995) , 10.1007/3-540-60159-7_5
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Stephen W. Wharton, A generalized histogram clustering scheme for multidimensional image data Pattern Recognition. ,vol. 16, pp. 193- 199 ,(1983) , 10.1016/0031-3203(83)90022-5
Teofilo F. Gonzalez, Clustering to minimize the maximum intercluster distance Theoretical Computer Science. ,vol. 38, pp. 293- 306 ,(1985) , 10.1016/0304-3975(85)90224-5