Improve the quality of supervised discretization of continuous valued attributes in data mining

作者: Dewan Md. Farid

DOI: 10.1109/ICCITECHN.2011.6164874

关键词: Discretization of continuous featuresMachine learningInterval (mathematics)Computer scienceNaive Bayes classifierDecision tree learningDecision treeHeuristicBenchmark (computing)Artificial intelligenceDiscretizationData mining

摘要: Dealing with continuous-valued attributes is an important data mining problem that has effects on accuracy, complexity, and understandability of the algorithms. This paper presents a new approach for dealing continuous improve quality discretization as preprocessing step decision tree naive Bayesian classifier. The proposed focus supervised discretization, however, unsupervised can also be applied in same way. It finds possible cut points attribute values separate class distributions, then consider best point interval border information gain heuristic been tested by comparing other methods number benchmark problems from UCI machine learning repository. experimental results proved improves discretization.

参考文章(15)
Thierry Van de Merckt, Decision Trees in Numerical Attribute Spaces. international joint conference on artificial intelligence. pp. 1016- 1021 ,(1993)
Michael J. Pazzani, An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers knowledge discovery and data mining. pp. 228- 233 ,(1995)
Pat Langley, Induction of Recursive Bayesian Classifiers european conference on machine learning. pp. 153- 164 ,(1993) , 10.1007/3-540-56602-3_134
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
Petra Perner, Sascha Trautzsch, Multi-interval Discretization Methods for Decision Tree Learning Lecture Notes in Computer Science. pp. 475- 482 ,(1998) , 10.1007/BFB0033269
James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3
J. R. Quinlan, Improved use of continuous attributes in C4.5 Journal of Artificial Intelligence Research. ,vol. 4, pp. 77- 90 ,(1996) , 10.1613/JAIR.279
Stephen D. Bay, Multivariate discretization of continuous variables for set mining knowledge discovery and data mining. pp. 315- 319 ,(2000) , 10.1145/347090.347159
Usama M. Fayyad, Keki B. Irani, On the Handling of Continuous-Valued Attributes in Decision Tree Generation Machine Learning. ,vol. 8, pp. 87- 102 ,(1992) , 10.1023/A:1022638503176
Bernhard Pfahringer, Compression-Based Discretization of Continuous Attributes Machine Learning Proceedings 1995. pp. 456- 463 ,(1995) , 10.1016/B978-1-55860-377-6.50063-3