Segmented Regression Estimators for Massive Data Sets.

作者: Edwin P. D. Pednault , Ramesh Natarajan

DOI:

关键词: Proper linear modelComputer scienceNonparametric regressionLogistic model treePolynomial regressionCategorical variableData miningLocal regressionRegression diagnosticSegmented regression

摘要: We describe two methodologies for obtaining segmented regression estimators from massive training data sets. The first methodology, called Linear Regression Tree (LRT), is used continuous response variables, and the second complementary Naive Bayes (NBT), categorical variables. These are implemented in IBM ProbE (Probabilistic Estimation) mining engine, which an object-oriented framework building classes of predictive models Based on this application ATM-SETM direct-mail targeted marketing has been developed jointly with Fingerhut Business Intelligence [1]).

参考文章(16)
Ron Kohavi, Mehran Sahami, Error-based and entropy-based discretization of continuous features knowledge discovery and data mining. pp. 114- 119 ,(1996)
Hung-Ju Huang, Tzu-Tsung Wong, Why Discretization Works for Naive Bayesian Classifiers international conference on machine learning. pp. 399- 406 ,(2000)
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Ron Kohavi, Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid knowledge discovery and data mining. pp. 202- 207 ,(1996)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Igor Kononenko, Semi-naive bayesian classifier Lecture Notes in Computer Science. pp. 206- 219 ,(1991) , 10.1007/BFB0017015
PAT LANGLEY, STEPHANIE SAGE, Induction of Selective Bayesian Classifiers Uncertainty Proceedings 1994. pp. 399- 406 ,(1994) , 10.1016/B978-1-55860-332-5.50055-9
George H. John, Pat Langley, Estimating continuous distributions in Bayesian classifiers uncertainty in artificial intelligence. pp. 338- 345 ,(1995)
C. Apte, E. Bibelnieks, R. Natarajan, E. Pednault, F. Tipu, D. Campbell, B. Nelson, Segmentation-based modeling for advanced targeted marketing knowledge discovery and data mining. pp. 408- 413 ,(2001) , 10.1145/502512.502573
David E. Hapeman, Categorical Data Analysis Technometrics. ,vol. 33, pp. 241- 241 ,(1991) , 10.1080/00401706.1991.10484817