作者: I.H. Jarman , T.A. Etchells , P.J.G. Lisboa , C.M Beynon , J.D. Martin-Guerrero
DOI: 10.1109/CIDM.2011.5949452
关键词:
摘要: Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm a popular choice, but K-means not generally appropriate for categorical data. A specific extension of the k-modes algorithm. Both these partition clustering methods are sensitive initialization prototypes, which creates difficulty selecting best solution given problem. In addition, number clusters can be issue. Further, method especially prone instability when presented with ‘noisy’ data, since calculation mode lacks smoothing effect mean. This often case real-world datasets, instance domain Public Health, resulting solutions that radically different depending on and therefore lead interpretations. paper presents two methodologies. addresses sensitivity initializations using generic landscape mapping k-mode solutions. second methodology utilizes map stabilize discrete by drawing consensus sample order separate signal from noise components. Results benchmark soybean disease dataset, artificially generated dataset study involving Health