作者: Rui F. Fernandes , Daniel Scherrer , Antoine Guisan
DOI: 10.1111/DDI.12868
关键词:
摘要: Aim: Species distribution information is essential under increasing global changes, and models can be used to acquire such but they affected by dif-ferent errors/bias. Here, we evaluated the degree which errors in species data (false presences–absences) affect model predictions how this reflected com-monly evaluation metrics.Location: Western Swiss Alps.Methods: Using 100 virtual different sampling methods, created ob-servation datasets of sizes (100–400–1,600) added levels (creating false positives or negatives; from 0% 50%). These degraded data-sets were fit using generalized linear model, random forest boosted regression trees. Model (ability reproduce calibration data) predic-tive success predict true distribution) measured on probabilistic/binary outcomes Kappa, TSS, MaxKappa, MaxTSS Somers’D (rescaled AUC).Results: The interpretation models’ performance depended met-rics evaluate them, with conclusions differing whether fit, measured. Added reduced performance, effects expectedly decreasing as sample size increased. was more af-fected than negatives. Models techniques differently errors: high presenting lower predictive (RFs), vice versa (GLMs). High metrics could still obtained 30% error added, indicating that some (Somers’D) might not sensitive enough detect degradation.Main conclusions: Our findings highlight need reconsider scale commonly metrics: Kappa seems realistic Somers’D/AUC TSS. fits show-ing RF overfits data. When collecting occurrence databases, it advisory reduce rate (or increase sizes) rather