On estimating model complexity and prediction errors in multivariate calibration: generalized resampling by random sample weighting (RSW)

作者: L. Xu , Q.-S. Xu , M. Yang , H.-Z. Zhang , C.-B. Cai

DOI: 10.1002/CEM.1323

关键词: GeneralizationMathematicsResamplingMean squared errorMonte Carlo methodCross-validationCalibration (statistics)Bootstrapping (statistics)WeightingStatistics

摘要: Thepresentpaper focusesondeterminingthenumberofPLScomponentsbyusingresamplingmethodssuchascrossvalidation (CV), Monte Carlo cross validation (MCCV), bootstrapping (BS), etc. To resample the training data, randomnon-negative weights are assigned to original samples and a sample-weighted PLS model is developedwithout increasing computational burden much. Random weighting generalization of traditionalresampling methods expected have lower risk getting an insufficient set. For prediction,only with random less than threshold value selected ensure that predictionsamples influence on training. complicated because optimal number components isoften not unique or readily distinguished there might exist region complexity, thedistribution prediction errors can be more useful single root mean squared error prediction(RMSEP). Therefore, distribution estimated by repeated sample andused determine complexity. RSW compared its traditional counterparts like CV, MCCV, BS arecently proposed randomization test method demonstrate usefulness. Copyright 2010 John Wiley & Sons,Ltd.Keywords: complexity; weighting; validation; bootstrapping;

参考文章(37)
Robert Mcgill, John W. Tukey, Wayne A. Larsen, Variations of Box Plots The American Statistician. ,vol. 32, pp. 12- 16 ,(1978) , 10.1080/00031305.1978.10479236
Claus. Borggaard, Hans Henrik. Thodberg, Optimal minimal neural interpretation of spectra Analytical Chemistry. ,vol. 64, pp. 545- 551 ,(1992) , 10.1021/AC00029A018
Yi Ping Du, Sumaporn Kasemsumran, Katsuhiko Maruo, Takehiro Nakagawa, Yukihiro Ozaki, Ascertainment of the number of samples in the validation set in Monte Carlo cross validation and the selection of model dimension with Monte Carlo cross validation Chemometrics and Intelligent Laboratory Systems. ,vol. 82, pp. 83- 89 ,(2006) , 10.1016/J.CHEMOLAB.2005.07.004
S Gourvénec, J.A Fernández Pierna, D.L Massart, D.N Rutledge, An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS model Chemometrics and Intelligent Laboratory Systems. ,vol. 68, pp. 41- 51 ,(2003) , 10.1016/S0169-7439(03)00086-8
Edward V. Thomas, Non-parametric statistical methods for multivariate calibration model selection and comparison Journal of Chemometrics. ,vol. 17, pp. 653- 659 ,(2003) , 10.1002/CEM.833
Nicolaas (Klaas) M Faber, Critical evaluation of a significance test for partial least squares regression Analytica Chimica Acta. ,vol. 432, pp. 235- 240 ,(2001) , 10.1016/S0003-2670(00)01381-7
Qing-Song Xu, Yi-Zeng Liang, Monte Carlo cross validation Chemometrics and Intelligent Laboratory Systems. ,vol. 56, pp. 1- 11 ,(2001) , 10.1016/S0169-7439(00)00122-2
Ian N. Wakeling, Jeff J. Morris, A test of significance for partial least squares regression Journal of Chemometrics. ,vol. 7, pp. 291- 304 ,(1993) , 10.1002/CEM.1180070407
L XU, J JIANG, W LIN, Y ZHOU, H WU, G SHEN, R YU, Optimized sample-weighted partial least squares. Talanta. ,vol. 71, pp. 561- 566 ,(2007) , 10.1016/J.TALANTA.2006.04.039