作者: Mo Zhang , Wenjiao Shi , Ziwei Xu
DOI: 10.5194/HESS-24-2505-2020
关键词: Spearman's rank correlation coefficient 、 Explained sum of squares 、 Silt 、 Mathematics 、 Soil texture 、 Soil test 、 Statistics 、 Interpolation 、 Random forest 、 Mean squared error
摘要: Abstract. Soil texture and soil particle size fractions (PSFs) play an increasing role in physical, chemical, hydrological processes. Many previous studies have used machine-learning log-ratio transformation methods for classification PSF interpolation to improve the prediction accuracy. However, few reports systematically compared their performance with respect to both interpolation. Here, five models – K-nearest neighbour (KNN), multilayer perceptron neural network (MLP), random forest (RF), support vector machines (SVM), extreme gradient boosting (XGB) combined original data three transformation methods additive log ratio (ALR), centred (CLR), and isometric (ILR) were applied evaluate and PSFs using raw log-ratio-transformed from 640 samples Heihe River basin (HRB) China. The results demonstrated that transformations decreased skewness of data. For texture classification, RF XGB showed better a higher overall accuracy kappa coefficient. They also recommended the classification capacity imbalanced according area under the precision–recall curve (AUPRC). interpolation, RF delivered best among five the lowest root-mean-square error (RMSE; sand had RMSE 15.09 %, silt was 13.86 %, and clay 6.31 %), mean absolute (MAE; MAD 10.65 %, 9.99 %, clay 5.00 %), Aitchison distance (AD; 0.84), standardized residual sum squares (STRESS; 0.61), highest Spearman rank correlation coefficient (RCC; 0.69, 0.67, 0.69). STRESS was improved by methods, especially CLR ILR. Prediction maps direct indirect similar middle and upper reaches HRB. maps provided more detailed information lower of the There pronounced improvement 21.3 % kappa coefficient when compared with methods. as strategy five machine-learning models, based on accuracy evaluation PSF interpolation classification, ILR for component-wise without multivariate treatment, considering constrained nature compositional In addition, XGB was preferred over other trade-off between runtime was considered. Our findings provide reference future works the spatial PSFs models with skewed distributions large area.