作者: Ron Kohavi , Mehran Sahami
DOI:
关键词: Data mining 、 Discretization 、 Decision tree learning 、 Entropy (classical thermodynamics) 、 Entropy (statistical thermodynamics) 、 Entropy (information theory) 、 Decision tree 、 Computer science 、 Discretization error 、 Algorithm 、 Entropy (arrow of time) 、 Discretization of continuous features 、 Computational complexity theory 、 Entropy (order and disorder) 、 Entropy (energy dispersal)
摘要: We present a comparison of error-based and entropy-based methods for discretization continuous features. Our study includes both an extensive empirical as well analysis scenarios where error minimization may be inappropriate criterion. method based on the C4.5 decision tree algorithm compare it to existing algorithm, which employs Minimum Description Length Principle, recently proposed technique. evaluate these with respect Naive-Bayesian classifiers datasets from UCI repository analyze computational complexity each method. results indicate that MDL heuristic outperforms average. then shortcomings approaches in methods.