作者: Edwin Aldana-Bobadilla , Angel Kuri-Morales
DOI: 10.3390/E17010151
关键词:
摘要: Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The are grouped into k subsets (clusters) whose elements optimize proximity measure. Methods based on information theory have proven be feasible alternatives. They the assumption that cluster one subset with minimal possible degree of "disorder". attempt minimize entropy each cluster. We propose clustering method maximum principle. Such explores space all probability distributions data find maximizes subject extra conditions prior about clusters. "similar" other accordance some statistical As consequence such principle, those high satisfy favored over others. Searching optimal distribution object clusters represents hard combinatorial problem, disallows use traditional optimization techniques. Genetic algorithms good alternative solve this problem. benchmark our relative best theoretical performance, given by Bayes classifier when normally distributed, and multilayer perceptron network, offers practical performance not normal. In general, supervised classification will outperform non-supervised one, since, first case, classes known priori. what follows, we show method's effectiveness comparable one. This clearly exhibits superiority method.