Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data

作者: Mo Zhang , Wenjiao Shi , Ziwei Xu

DOI: 10.5194/HESS-24-2505-2020

关键词: Spearman's rank correlation coefficientExplained sum of squaresSiltMathematicsSoil textureSoil testStatisticsInterpolationRandom forestMean squared error

摘要: Abstract. Soil texture and soil particle size fractions (PSFs) play an increasing role in physical, chemical, hydrological processes. Many previous studies have used machine-learning log-ratio transformation methods for classification PSF interpolation to improve the prediction accuracy. However, few reports systematically compared their performance with respect to both interpolation. Here, five models – K-nearest neighbour (KNN), multilayer perceptron neural network (MLP), random forest (RF), support vector machines (SVM), extreme gradient boosting (XGB) combined original data three transformation methods additive log ratio (ALR), centred (CLR), and isometric (ILR) were applied evaluate and PSFs using raw log-ratio-transformed from 640 samples Heihe River basin (HRB) China. The results demonstrated that transformations decreased skewness of data. For texture classification, RF XGB showed better a higher overall accuracy kappa coefficient. They also recommended the classification capacity imbalanced according area under the precision–recall curve (AUPRC). interpolation, RF delivered best among five the lowest root-mean-square error (RMSE; sand had RMSE 15.09 %, silt was 13.86 %, and clay 6.31 %), mean absolute (MAE; MAD 10.65 %, 9.99 %, clay 5.00 %), Aitchison distance (AD; 0.84), standardized residual sum squares (STRESS; 0.61), highest Spearman rank correlation coefficient (RCC; 0.69, 0.67, 0.69). STRESS was improved by methods, especially CLR ILR. Prediction maps direct indirect similar middle and upper reaches HRB. maps provided more detailed information lower of the There pronounced improvement 21.3 % kappa coefficient when compared with methods. as strategy five machine-learning models, based on accuracy evaluation PSF interpolation classification, ILR for component-wise without multivariate treatment, considering constrained nature compositional In addition, XGB was preferred over other trade-off between runtime was considered. Our findings provide reference future works the spatial PSFs models with skewed distributions large area.

参考文章(79)
A Huete, K Didan, T Miura, E.P Rodriguez, X Gao, L.G Ferreira, Overview of the radiometric and biophysical performance of the MODIS vegetation indices Remote Sensing of Environment. ,vol. 83, pp. 195- 213 ,(2002) , 10.1016/S0034-4257(02)00096-2
Peter Filzmoser, Karel Hron, Clemens Reimann, Univariate statistical analysis of environmental (compositional) data: Problems and possibilities Science of The Total Environment. ,vol. 407, pp. 6100- 6108 ,(2009) , 10.1016/J.SCITOTENV.2009.08.008
Corinna Cortes, Vladimir Vapnik, Support-Vector Networks Machine Learning. ,vol. 20, pp. 273- 297 ,(1995) , 10.1023/A:1022627411411
T. Cover, P. Hart, Nearest neighbor pattern classification IEEE Transactions on Information Theory. ,vol. 13, pp. 21- 27 ,(1967) , 10.1109/TIT.1967.1053964
Bingfang Wu, Nana Yan, Jun Xiong, W.G.M. Bastiaanssen, Weiwei Zhu, Alfred Stein, Validation of ETWatch using field measurements at diverse landscapes: A case study in Hai Basin of China Journal of Hydrology. ,vol. 436, pp. 67- 80 ,(2012) , 10.1016/J.JHYDROL.2012.02.043
J. Elith, J. R. Leathwick, T. Hastie, A working guide to boosted regression trees Journal of Animal Ecology. ,vol. 77, pp. 802- 813 ,(2008) , 10.1111/J.1365-2656.2008.01390.X
Christopher J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition Data Mining and Knowledge Discovery. ,vol. 2, pp. 121- 167 ,(1998) , 10.1023/A:1009715923555
J. J. Egozcue, V. Pawlowsky-Glahn, Groups of Parts and Their Balances in Compositional Data Analysis Mathematical Geosciences. ,vol. 37, pp. 795- 828 ,(2005) , 10.1007/S11004-005-7381-9
Tomislav Hengl, Gerard B. M. Heuvelink, Bas Kempen, Johan G. B. Leenaars, Markus G. Walsh, Keith D. Shepherd, Andrew Sila, Robert A. MacMillan, Jorge Mendes de Jesus, Lulseged Tamene, Jérôme E. Tondoh, Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions PLOS ONE. ,vol. 10, pp. e0125814- 26 ,(2015) , 10.1371/JOURNAL.PONE.0125814
J. Richard Landis, Gary G. Koch, The measurement of observer agreement for categorical data Biometrics. ,vol. 33, pp. 159- 174 ,(1977) , 10.2307/2529310