作者: S.A. Gansky
DOI: 10.1177/154407370301700125
关键词: Lift (data mining) 、 Decision tree 、 Regression 、 Covariate 、 Regression analysis 、 Logistic regression 、 Data mining 、 Sample size determination 、 Computer science 、 Knowledge extraction
摘要: Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths limitations? Classic regression, artificial neural network (ANN), classification regression tree (CART) models common KDD tools. Some recent reports (e.g., Kattan et al., 1998) show that ANN CART can perform better than classic models: excel at covariate interactions, while nonlinear covariates. Model prediction performance examined with the use of validation procedures evaluating concordance, sensitivity, specificity, likelihood ratio. To aid interpretation, various plots predicted probabilities utilized, such as lift charts, receiver operating characteristic curves, cumulative captured-response plots. A dental caries study used an illustrative example. This paper compares logistic methods in analyzing from Rochester study. With careful analysis, sufficient sample size proper competitors, problems naive analyses (Schwarzer 2000) be carefully avoided.