General and Efficient Multisplitting of Numerical Attributes

作者:

DOI: 10.1023/A:1007674919412

关键词:

摘要: Often in supervised learning numerical attributes require special treatment and do not fit the scheme as well one could hope. Nevertheless, they are common practical tasks and, therefore, need to be taken into account. We characterize well-behavedness of an evaluation function, a property that guarantees optimal multi-partition arbitrary domain defined on boundary points. Well-behavedness reduces number candidate cut points examined multisplitting attributes. Many commonly used attribute functions possess this propertys we demonstrate cumulative Information Gain Training Set Error non-cumulative Ratio Normalized Distance Measure all well-behaved. also devise method finding multisplits efficiently by examining minimum point combinations is required produce partitions which with respect well-behaved function. Our empirical experiments validate utility multisplitting: it produces constantly better than alternative approaches only requires comparable time. In top-down induction decision trees choice function has more decisive effect result partitioning strategys optimizing value most does raise accuracy produced trees. our tests construction time using was, average, twice greedy multisplitting, its part average binary splitting.

参考文章(43)
Keki B. Irani, Usama M. Fayyad, The attribute selection problem in decision tree generation national conference on artificial intelligence. pp. 104- 110 ,(1992)
Thomas G. Dietterich, Yishay Mansour, Michael J. Kearns, Applying the Waek Learning Framework to Understand and Improve C4.5. international conference on machine learning. pp. 96- 104 ,(1996)
Thierry Van de Merckt, Decision Trees in Numerical Attribute Spaces. international joint conference on artificial intelligence. pp. 1016- 1021 ,(1993)
Igor Kononenko, On biases in estimating multi-valued attributes international joint conference on artificial intelligence. pp. 1034- 1040 ,(1995)
C. S. Wallace, P. R. Freeman, Estimation and Inference by Compact Coding Journal of the royal statistical society series b-methodological. ,vol. 49, pp. 240- 252 ,(1987) , 10.1111/J.2517-6161.1987.TB01695.X
Ivan Bratko, Igor Kononenko, Bojan Cestnik, ASSISTANT 86: a knowledge-elicitation tool for sophisticated users EWSL'87 Proceedings of the 2nd European Conference on European Working Session on Learning. pp. 31- 45 ,(1987)
Truxton Fulton, Simon Kasif, Steven Salzberg, Efficient Algorithms for Finding Multi-way Splits for Decision Trees Machine Learning Proceedings 1995. pp. 244- 251 ,(1995) , 10.1016/B978-1-55860-377-6.50038-4
Peter Auer, Robert C. Holte, Wolfgang Maass, Theory and Applications of Agnostic PAC-Learning with Small Decision Trees Machine Learning Proceedings 1995. pp. 21- 29 ,(1995) , 10.1016/B978-1-55860-377-6.50012-8