Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data

作者: Richard G. Brereton

DOI: 10.1016/J.TRAC.2006.10.005

关键词:

摘要: Abstract This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number variables is large. It describes use percentage correctly classified (%CC) as an indicator for success a model. For datasets, %CC should not be used uncritically its interpretation depends on size. illustrates common method, discriminant partial least squares (D-PLS) randomly generated dataset 200 samples variables. An aim classifier to determine whether null hypothesis (there no distinction between two classes) can rejected. Autoprediction gives 84.5% CC. shown that, if there variable selection, it must performed independently training set obtain CC close 50% test set; otherwise, over-optimistic false conclusions reached about ability classify into groups. Finally, aims determining quality model frequently confused, namely optimisation (often most appropriate components model) independent validation; overcome this, data split three There often difficulties with building validation have been done different groups samples, using iterative methods, each group being modelled properties, such or

参考文章(8)
Richard G. Brereton, Applied Chemometrics for Scientists ,(2007)
Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)
R. De Maesschalck, D. Jouan-Rimbaud, D.L. Massart, The Mahalanobis distance Chemometrics and Intelligent Laboratory Systems. ,vol. 50, pp. 1- 18 ,(2000) , 10.1016/S0169-7439(99)00047-7
Marianne Defernez, E.Katherine Kemsley, THE USE AND MISUSE OF CHEMOMETRICS FOR TREATING CLASSIFICATION PROBLEMS Trends in Analytical Chemistry. ,vol. 16, pp. 216- 221 ,(1997) , 10.1016/S0165-9936(97)00015-0
Svante Wold, Pattern recognition by means of disjoint principal components models Pattern Recognition. ,vol. 8, pp. 127- 139 ,(1976) , 10.1016/0031-3203(76)90014-5