Using Resampling Techniques for Better Quality Discretization

作者: Taimur Qureshi , Djamel A. Zighed

DOI: 10.1007/978-3-642-03070-3_6

关键词:

摘要: Many supervised induction algorithms require discrete data, however real data often comes in both and continuous formats. Quality discretization of attributes is an important problem that has effects on accuracy, complexity, variance understandability the model. Usually, other types statistical processes are applied to subsets population as entire practically inaccessible. For this reason we argue performed a sample only estimate population. Most existing methods, partition attribute range into two or several intervals using single set cut points. In paper, introduce variants resampling technique (such bootstrap) generate candidate points thus, improving quality by providing better estimation towards Thus, goal paper observe whether type can lead points, which opens up new paradigm construction soft decision trees.

参考文章(20)
Hung-Ju Huang, Tzu-Tsung Wong, Why Discretization Works for Naive Bayesian Classifiers international conference on machine learning. pp. 399- 406 ,(2000)
Randy Kerber, ChiMerge: discretization of numeric attributes national conference on artificial intelligence. pp. 123- 128 ,(1992)
Pierre Geurts, Louis Wehenkel, Investigation and Reduction of Discretization Variance in Decision Tree Induction european conference on machine learning. pp. 162- 170 ,(2000) , 10.1007/3-540-45164-1_17
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
J. R. Quinlan, Improved use of continuous attributes in C4.5 Journal of Artificial Intelligence Research. ,vol. 4, pp. 77- 90 ,(1996) , 10.1613/JAIR.279
Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)
Walter D. Fisher, On Grouping for Maximum Homogeneity Journal of the American Statistical Association. ,vol. 53, pp. 789- 798 ,(1958) , 10.1080/01621459.1958.10501479
Marc Boullé, MODL: A Bayes optimal discretization method for continuous attributes Machine Learning. ,vol. 65, pp. 131- 165 ,(2006) , 10.1007/S10994-006-8364-X
Herbert Toth, IPMU '92 — Advanced methods in artificial intelligence Fuzzy Sets and Systems. ,vol. 62, pp. 382- ,(1994) , 10.1016/0165-0114(94)90130-9
Usama M. Fayyad, Keki B. Irani, On the Handling of Continuous-Valued Attributes in Decision Tree Generation Machine Learning. ,vol. 8, pp. 87- 102 ,(1992) , 10.1023/A:1022638503176