作者: Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park
关键词:
摘要: The clustering problem is well known in the database literature for its numerous applications problems such as customer segmentation, classification and trend analysis. Unfortunately, all algorithms tend to break down high dimensional spaces because of inherent sparsity points. In not dimensions may be relevant a given cluster. One way handling this pick closely correlated find clusters corresponding subspace. Traditional feature selection attempt achieve this. weakness approach that typical data mining different sets points cluster better subsets dimensions. number each cluster-specific subspace also vary. Hence, it impossible single small subset clusters. We therefore discuss generalization problem, referred projected which selected are specific themselves. develop an algorithmic framework solving test performance on synthetic data.