A novel pruning approach using expert knowledge for data-specific pruning

作者: Ali Mirza Mahmood , Mrithyumjaya Rao Kuppa

DOI: 10.1007/S00366-011-0214-1

关键词:

摘要: Classification is an important data mining task that discovers hidden knowledge from the labeled datasets. Most approaches to pruning assume all dataset are equally uniform and important, so they apply equal However, in real-world classification problems, datasets not considering rate during tends generate a decision tree with large size high misclassification rate. We approach problem by first investigating properties of each then deriving data-specific value using expert which used design techniques prune trees close perfection. An efficient algorithm dubbed EKBP proposed very general as we free use any learning base classifier. have implemented our solution experimentally verified its effectiveness forty real world benchmark UCI machine repository. In these experiments, shows it can dramatically reduce while enhancing or retaining level accuracy.

参考文章(17)
Jorma Rissanen, Rakesh Agrawal, Manish Mehta, MDL-based decision tree pruning knowledge discovery and data mining. pp. 216- 221 ,(1995)
Yishay Mansour, Michael J. Kearns, A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization international conference on machine learning. pp. 269- 277 ,(1998)
R. V. L. Hartley, Transmission of Information1 Bell System Technical Journal. ,vol. 7, pp. 535- 563 ,(1928) , 10.1002/J.1538-7305.1928.TB01236.X
Edward Hance Shortliffe, Bruce G. Buchanan, Rule-based expert systems : the MYCIN experiments of the Stanford Heuristic Programming Project Addison-Wesley. ,(1985)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
John Mingers, An Empirical Comparison of Pruning Methods for Decision Tree Induction Machine Learning. ,vol. 4, pp. 227- 243 ,(1989) , 10.1023/A:1022604100933
C.S. Wallace, J.D. Patrick, Coding Decision Trees Machine Learning. ,vol. 11, pp. 7- 22 ,(1993) , 10.1023/A:1022646101185
J. Ross Quinlan, Ronald L. Rivest, Inferring decision trees using the minimum description length principle Information & Computation. ,vol. 80, pp. 227- 248 ,(1989) , 10.1016/0890-5401(89)90010-2
CE Shennon, Warren Weaver, A mathematical theory of communication Bell System Technical Journal. ,vol. 27, pp. 379- 423 ,(1948) , 10.1002/J.1538-7305.1948.TB01338.X
Hussein Almuallim, An efficient algorithm for optimal pruning of decision trees Artificial Intelligence. ,vol. 83, pp. 347- 362 ,(1996) , 10.1016/0004-3702(95)00060-7