作者: Maria Del Mar , Suarez Alvarez
DOI:
关键词:
摘要: In recent times, several machine learning techniques have been applied successfully to discover useful knowledge from data. Cluster analysis that aims at finding similar subgroups a large heterogeneous collection of records, is one o f the most and popular available data mining. The purpose this research design and analyse clustering algorithms for numerical, categorical mixed sets. Most are limited either numerical or categorical attributes. Datasets with types attributes common in real life so sets quite timely. Determining optimal solution problem NP-hard. Therefore, it necessary find solutions regarded as “good enough” quickly. Similarity fundamental concept definition cluster. It very calculate similarity dissimilarity between two features using distance measure. Attributes ranges will implicitly assign larger contributions metrics than the application small ranges. There only few papers especially devoted normalisation methods. Usually scaled unit range. This does not secure equal average all For reason, main part thesis devoted normalisation.