Calibration of probability predictions from machine‐learning and statistical models

作者: Carsten F. Dormann

DOI: 10.1111/GEB.13070

关键词:

摘要: AIM: Predictions from statistical models may be uncalibrated, meaning that the predicted values do not have nominal coverage probability. This is easiest seen with probability predictions in machine‐learning classification, including common species occurrence probabilities. Here, a of, say, .7 should indicate out of 100 cases these environmental conditions, and hence same probability, present 70 absent 30. INNOVATION: A simple calibration plot shows this necessarily case, particularly for overfitted or algorithms use non‐likelihood target functions. As consequence, ‘raw’ such model could easily off by .2, are unsuitable averaging across types, resulting maps substantially distorted. The solution, flexible regression, can applied whenever deviations observed. MAIN CONCLUSIONS: ‘Raw’, uncalibrated calibrated before interpreting them probabilistic way.

参考文章(28)
Justin M. Calabrese, Grégoire Certain, Casper Kraan, Carsten F. Dormann, Stacking species distribution models and adjusting bias by linking them to macroecological models Global Ecology and Biogeography. ,vol. 23, pp. 99- 112 ,(2014) , 10.1111/GEB.12102
Gurutzeta Guillera-Arroita, José J. Lahoz-Monfort, Jane Elith, Ascelin Gordon, Heini Kujala, Pia E. Lentini, Michael A. McCarthy, Reid Tingley, Brendan A. Wintle, Is my species distribution model fit for purpose? Matching data and models to applications Global Ecology and Biogeography. ,vol. 24, pp. 276- 292 ,(2015) , 10.1111/GEB.12268
Alison Johnston, Daniel Fink, Mark D. Reynolds, Wesley M. Hochachka, Brian L. Sullivan, Nicholas E. Bruns, Eric Hallstein, Matt S. Merrifield, Sandi Matsumoto, Steve Kelling, Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications. ,vol. 25, pp. 1749- 1756 ,(2015) , 10.1890/14-1826.1
J. Andrew Royle, Marc Kéry, Roland Gautier, Hans Schmid, HIERARCHICAL SPATIAL MODELS OF ABUNDANCE AND OCCURRENCE FROM IMPERFECT SURVEY DATA Ecological Monographs. ,vol. 77, pp. 465- 481 ,(2007) , 10.1890/06-0912.1
Trevor Hastie, William Fithian, Finite-Sample Equivalence in Statistical Models for Presence-Only Data The Annals of Applied Statistics. ,vol. 7, pp. 1917- 1939 ,(2013) , 10.1214/13-AOAS667
Georg Heinze, Michael Schemper, A solution to the problem of separation in logistic regression Statistics in Medicine. ,vol. 21, pp. 2409- 2419 ,(2002) , 10.1002/SIM.1047
Heidi K Mod, Peter C le Roux, Antoine Guisan, Miska Luoto, None, Biotic interactions boost spatial models of species richness Ecography. ,vol. 38, pp. 913- 921 ,(2015) , 10.1111/ECOG.01129
Hsuan-Tien Lin, Chih-Jen Lin, Ruby C Weng, None, A note on Platt's probabilistic outputs for support vector machines Machine Learning. ,vol. 68, pp. 267- 276 ,(2007) , 10.1007/S10994-007-5018-6