作者: Chastine Fatichah , Dimas Ari Setyawan
DOI: 10.12962/J24068535.V18I2.A1005
关键词:
摘要: The classification process using a decision tree is method that has feature selection process. Decision classifications information gain have disadvantage when the dataset unique attributes for each imbalanced class record and distribution. data used 2 types, numerical nominal. type carried out discretization so it gets intervals. Weaknesses in can be reduced by dispersion ratio does not depend on distribution, but frequency Numeric will dis-criticized hierarchical clustering to obtain balanced cluster. this study were taken from UCI machine learning repository, which two types of numeric nominal data. There are stages research namely, first discretized with 3 methods, namely single link, complete average link. Second, results merged again then formation trees splitting evaluated cross-validation k-fold 7. obtained show increase predictions 14.6% compared without discretization. attribute resulting prediction 6.51%.