Error-based and entropy-based discretization of continuous features

DOI:

关键词: Data mining 、 Discretization 、 Decision tree learning 、 Entropy (classical thermodynamics) 、 Entropy (statistical thermodynamics) 、 Entropy (information theory) 、 Decision tree 、 Computer science 、 Discretization error 、 Algorithm 、 Entropy (arrow of time) 、 Discretization of continuous features 、 Computational complexity theory 、 Entropy (order and disorder) 、 Entropy (energy dispersal)

摘要: We present a comparison of error-based and entropy-based methods for discretization continuous features. Our study includes both an extensive empirical as well analysis scenarios where error minimization may be inappropriate criterion. method based on the C4.5 decision tree algorithm compare it to existing algorithm, which employs Minimum Description Length Principle, recently proposed technique. evaluate these with respect Naive-Bayesian classifiers datasets from UCI repository analyze computational complexity each method. results indicate that MDL heuristic outperforms average. then shortcomings approaches in methods.

参考文章(13)

Peter Auer, Robert C. Holte, Wolfgang Maass, Theory and Applications of Agnostic PAC-Learning with Small Decision Trees Machine Learning Proceedings 1995. pp. 21- 29 ,(1995) , 10.1016/B978-1-55860-377-6.50012-8

Se June Hong, Chidanand Apte, Predicting equity returns from securities data knowledge discovery and data mining. pp. 541- 560 ,(1996)

Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)

J. Catlett, On changing continuous attributes into ordered discrete attributes Lecture Notes in Computer Science. pp. 164- 178 ,(1991) , 10.1007/BFB0017012

George H John, Ron Kohavi, Karl Pfleger, None, Irrelevant Features and the Subset Selection Problem Machine Learning Proceedings 1994. pp. 121- 129 ,(1994) , 10.1016/B978-1-55860-335-6.50023-4

James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3

Irving John Good, The Estimation Of Probabilities: An Essay on Modern Bayesian Methods ,(1965)

Scott Cost, Steven Salzberg, A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features Machine Learning. ,vol. 10, pp. 57- 78 ,(1993) , 10.1023/A:1022664626993

Wolfgang Maass, Efficient agnostic PAC-learning with simple hypothesis Proceedings of the seventh annual conference on Computational learning theory - COLT '94. pp. 67- 75 ,(1994) , 10.1145/180139.181016

10.

R. Kohavi, G. John, R. Long, D. Manley, K. Pfleger, MLC++: a machine learning library in C++ international conference on tools with artificial intelligence. pp. 740- 743 ,(1994) , 10.1109/TAI.1994.346412

Error-based and entropy-based discretization of continuous features

来源期刊

我的账户

Error-based and entropy-based discretization of continuous features

来源期刊

相似文章 10

我的账户