Contributions to decision tree based learning

作者: Taimur Qureshi

DOI:

关键词:

摘要: Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data learning techniques which aim at producing high-level information, or models, from data. A Typical knowledge discovery process consists of data selection, data preparation, data transformation, data mining and interpretation/validation of the results. Thus, we develop automatic learning techniques which contribute to the data preparation, transformation and mining tasks of knowledge discovery. In doing so, we try to improve the prediction accuracy of the overall learning process. Our work focuses on decision tree based learning and thus, we introduce various preprocessing and transformation techniques such as discretization, fuzzy partitioning and dimensionality reduction to improve this type of learning. However, these techniques can be used in other learning methods e.g. discretization can also be used for naive-bayes classifiers. The data preparation step represents almost 80 percent of the problem and is both time consuming and critical for the quality of modeling. Discretization of continuous features is an important problem that has effects on accuracy, complexity, variance and understandability of the induction models. In this thesis, we propose and develop resampling based aggregation techniques that improve the quality of discretization. Later, we validate by comparing with other discretization techniques and with an optimal partitioning method on 10 benchmark data sets.The second part of our thesis concerns with automatic fuzzy partitioning for soft decision tree induction. Soft or fuzzy …

参考文章(0)