Clustering categorical data: A stability analysis framework

作者： I.H. Jarman , T.A. Etchells , P.J.G. Lisboa , C.M Beynon , J.D. Martin-Guerrero

DOI: 10.1109/CIDM.2011.5949452

关键词:

摘要: Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm a popular choice, but K-means not generally appropriate for categorical data. A specific extension of the k-modes algorithm. Both these partition clustering methods are sensitive initialization prototypes, which creates difficulty selecting best solution given problem. In addition, number clusters can be issue. Further, method especially prone instability when presented with ‘noisy’ data, since calculation mode lacks smoothing effect mean. This often case real-world datasets, instance domain Public Health, resulting solutions that radically different depending on and therefore lead interpretations. paper presents two methodologies. addresses sensitivity initializations using generic landscape mapping k-mode solutions. second methodology utilizes map stabilize discrete by drawing consensus sample order separate signal from noise components. Results benchmark soybean disease dataset, artificially generated dataset study involving Health

参考文章(23)

Zhexue Huang, CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES ,(1997)

P. Arabie, Cluster analysis in marketing research Advanced methods in marketing research. pp. 160- 189 ,(1994)

Caryl M Beynon, Mark A Bellis, Jim McVeigh, Trends in drop out, drug free discharge and rates of re-presentation: a retrospective cohort study of drug treatment clients in the North West of England BMC Public Health. ,vol. 6, pp. 205- 205 ,(2006) , 10.1186/1471-2458-6-205

Hai-yong Liao, Michael K. Ng, Categorical data clustering with automatic selection of cluster number Fuzzy Information and Engineering. ,vol. 1, pp. 5- 25 ,(2009) , 10.1007/S12543-009-0001-5

Asa Ben-Hur, Andre Elisseeff, Isabelle Guyon, A stability based method for discovering structure in clustered data. pacific symposium on biocomputing. pp. 6- 17 ,(2001) , 10.1142/9789812799623_0002

P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering Large Graphs via the Singular Value Decomposition Machine Learning. ,vol. 56, pp. 9- 33 ,(2004) , 10.1023/B:MACH.0000033113.59016.96

Anil K. Jain, Data clustering: 50 years beyond K-means international conference on pattern recognition. ,vol. 31, pp. 651- 666 ,(2010) , 10.1016/J.PATREC.2009.09.011

Jianchao Fan, Min Han, Jun Wang, Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation Pattern Recognition. ,vol. 42, pp. 2527- 2540 ,(2009) , 10.1016/J.PATCOG.2009.04.013

Yvonne M Bishop, Stephen E Fienberg, Paul W Holland, None, Discrete Multivariate Analysis: Theory and Practice ,(1975)

10.

Yvonne Bonomo, Jenny Proimos, Substance misuse: alcohol, tobacco, inhalants, and other drugs BMJ. ,vol. 330, pp. 777- 780 ,(2005) , 10.1136/BMJ.330.7494.777

Clustering categorical data: A stability analysis framework

来源期刊

我的账户

Clustering categorical data: A stability analysis framework

来源期刊

相似文章 0

我的账户