Ascertainment of the number of samples in the validation set in Monte Carlo cross validation and the selection of model dimension with Monte Carlo cross validation

作者: Yi Ping Du , Sumaporn Kasemsumran , Katsuhiko Maruo , Takehiro Nakagawa , Yukihiro Ozaki

DOI: 10.1016/J.CHEMOLAB.2005.07.004

关键词: Latent variableCross-validationMathematicsDimension (vector space)Mean squared errorCalibration (statistics)Data setPartial least squares regressionObservational errorStatistics

摘要: Monte Carlo cross validation (MCCV) is used in two data sets including 125 and 1643 near-infrared (NIR) spectra of biological samples, respectively, to ascertain the number samples left out for MCCV dimension PLS models consequently. With selected set, suitable latent variables (LV) may be chosen correctly. The results obtained show that root mean squared error calibration (RMSEC), (RMSECV) LV are sensitive when too many out. Based on this, RMSEC RMSECV suggested as criteria assist ascertainment MCCV. This method easy convenient use. For a larger more out, but will decrease if measurement level high.

参考文章(21)
G. Wahba, S. Wold, A completely automatic french curve: fitting spline functions by cross validation Communications in Statistics-theory and Methods. ,vol. 4, pp. 1- 17 ,(1975) , 10.1080/03610927508827223
S Gourvénec, J.A Fernández Pierna, D.L Massart, D.N Rutledge, An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS model Chemometrics and Intelligent Laboratory Systems. ,vol. 68, pp. 41- 51 ,(2003) , 10.1016/S0169-7439(03)00086-8
Qing-Song Xu, Yi-Zeng Liang, Monte Carlo cross validation Chemometrics and Intelligent Laboratory Systems. ,vol. 56, pp. 1- 11 ,(2001) , 10.1016/S0169-7439(00)00122-2
Katsuhiko Maruo, Mitsuhiro Tsurugi, Mamoru Tamura, Yukihiro Ozaki, In vivo noninvasive measurement of blood glucose by near-infrared diffuse-reflectance spectroscopy. Applied Spectroscopy. ,vol. 57, pp. 1236- 1244 ,(2003) , 10.1366/000370203769699090
Tormod Næs, Leverage and influence measures for principal component regression Chemometrics and Intelligent Laboratory Systems. ,vol. 5, pp. 155- 168 ,(1989) , 10.1016/0169-7439(89)80012-7
Jun Shao, Linear Model Selection by Cross-validation Journal of the American Statistical Association. ,vol. 88, pp. 486- 494 ,(1993) , 10.1080/01621459.1993.10476299
Qing-Song Xu, Yi-Zeng Liang, Yi-Ping Du, Monte Carlo cross-validation for selecting a model and estimating the prediction error in multivariate calibration Journal of Chemometrics. ,vol. 18, pp. 112- 120 ,(2004) , 10.1002/CEM.858