作者: Salvador Garcia , J. Luengo , José Antonio Sáez , Victoria López , F. Herrera
DOI: 10.1109/TKDE.2012.35
关键词: Machine learning 、 Categorization 、 Computer science 、 Knowledge extraction 、 Artificial intelligence 、 Taxonomy (general) 、 Data pre-processing 、 Data set 、 Data mining 、 Decision tree 、 Set (abstract data type) 、 Categorical variable 、 Supervised learning 、 Discretization
摘要: Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal to transform a set of continuous attributes into discrete ones, by associating categorical values intervals thus transforming quantitative qualitative data. In this manner, symbolic algorithms can be applied over the representation information simplified, making it more concise specific. The literature provides numerous proposals discretization some attempts categorize them taxonomy found. However, previous papers, there lack consensus definition properties no formal categorization has been established yet, which may confusing for practitioners. Furthermore, only small discretizers have widely considered, while other methods gone unnoticed. With intention alleviating these problems, paper survey proposed from theoretical empirical perspective. From perspective, we develop based on pointed out research, unifying notation including all known up date. Empirically, conduct experimental study supervised classification involving most representative newest discretizers, different types classifiers, large number sets. results their performances measured terms accuracy, intervals, inconsistency verified means nonparametric statistical tests. Additionally, are highlighted as best performing ones.