Cost sensitive discretization of numeric attributes

作者: Tom Brijs , Koen Vanhoof

DOI: 10.1007/BFB0094810

关键词:

摘要: Many algorithms in decision tree learning are not designed to handle numeric valued attributes very well. Therefore, discretization of the continuous feature space has be carried out. In this article we introduce concept cost sensitive as a preprocessing step induction classifier and an elaboration error-based method obtain optimal multi-interval splitting for each attribute. A transparant description steps involved is given. We also evaluate its performance against two other well known methods, i.e. entropy-based pure on real life financial dataset. From algoritmic point view, show that important deficiency from methods can solved by introducing costs. application discovered using recommended. To conclude, use ROC-curves illustrate under particular conditions cost-based may optimal.

参考文章(15)
Ron Kohavi, Mehran Sahami, Error-based and entropy-based discretization of continuous features knowledge discovery and data mining. pp. 114- 119 ,(1996)
Thierry Van de Merckt, Decision Trees in Numerical Attribute Spaces. international joint conference on artificial intelligence. pp. 1016- 1021 ,(1993)
Truxton Fulton, Simon Kasif, Steven Salzberg, Efficient Algorithms for Finding Multi-way Splits for Decision Trees Machine Learning Proceedings 1995. pp. 244- 251 ,(1995) , 10.1016/B978-1-55860-377-6.50038-4
Randy Kerber, ChiMerge: discretization of numeric attributes national conference on artificial intelligence. pp. 123- 128 ,(1992)
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
J. Catlett, On changing continuous attributes into ordered discrete attributes Lecture Notes in Computer Science. pp. 164- 178 ,(1991) , 10.1007/BFB0017012
James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3
Usama M. Fayyad, Keki B. Irani, On the Handling of Continuous-Valued Attributes in Decision Tree Generation Machine Learning. ,vol. 8, pp. 87- 102 ,(1992) , 10.1023/A:1022638503176
Bernhard Pfahringer, Compression-Based Discretization of Continuous Attributes Machine Learning Proceedings 1995. pp. 456- 463 ,(1995) , 10.1016/B978-1-55860-377-6.50063-3
Michal R. Chmielewski, Jerzy W. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning International Journal of Approximate Reasoning. ,vol. 15, pp. 319- 331 ,(1996) , 10.1016/S0888-613X(96)00074-6