作者: Jochen Kruppa , Yufeng Liu , Gérard Biau , Michael Kohler , Inke R. König
关键词:
摘要: Probability estimation for binary and multicategory outcome using logistic multinomial regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, probabilities individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest (b-NN), random forests (RF), support vector machines (SVM). Because methods are rarely used by applied biostatisticians, primary goal of this paper to explain concept probability these summarize recent theoretical findings. k-NN, b-NN, RF embedded into class nonparametric machines; therefore, we start construction estimates review results on consistency rates convergence. SVMs, repeatedly solving classification problems. For SVMs problem then dichotomous estimation. Next extend algorithms estimating outcomes discuss approaches SVM. simulation studies dependent variables demonstrate general validity compare it regression. each method fails at least one scenario. We conclude discussion failures give recommendations selecting tuning methods. Applications real data example code provided companion article (doi:10.1002/bimj.201300077).