Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data

DOI: 10.1016/J.TRAC.2006.10.005

关键词:

摘要: Abstract This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number variables is large. It describes use percentage correctly classified (%CC) as an indicator for success a model. For datasets, %CC should not be used uncritically its interpretation depends on size. illustrates common method, discriminant partial least squares (D-PLS) randomly generated dataset 200 samples variables. An aim classifier to determine whether null hypothesis (there no distinction between two classes) can rejected. Autoprediction gives 84.5% CC. shown that, if there variable selection, it must performed independently training set obtain CC close 50% test set; otherwise, over-optimistic false conclusions reached about ability classify into groups. Finally, aims determining quality model frequently confused, namely optimisation (often most appropriate components model) independent validation; overcome this, data split three There often difficulties with building validation have been done different groups samples, using iterative methods, each group being modelled properties, such or

sciencedirect.com 本地加速

参考文章(8)

Richard G. Brereton, Multivariate Pattern Recognition in Chemometrics: Illustrated by Case Studies Amsterdam:Elsevier. ,(1992)

Richard G. Brereton, Applied Chemometrics for Scientists ,(2007)

Richard G. Brereton, Chemometrics: Data Analysis for the Laboratory and Chemical Plant ,(2003)

Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)

R. De Maesschalck, D. Jouan-Rimbaud, D.L. Massart, The Mahalanobis distance Chemometrics and Intelligent Laboratory Systems. ,vol. 50, pp. 1- 18 ,(2000) , 10.1016/S0169-7439(99)00047-7

Svante Wold, Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models Technometrics. ,vol. 20, pp. 397- 405 ,(1978) , 10.1080/00401706.1978.10489693

Marianne Defernez, E.Katherine Kemsley, THE USE AND MISUSE OF CHEMOMETRICS FOR TREATING CLASSIFICATION PROBLEMS Trends in Analytical Chemistry. ,vol. 16, pp. 216- 221 ,(1997) , 10.1016/S0165-9936(97)00015-0

Svante Wold, Pattern recognition by means of disjoint principal components models Pattern Recognition. ,vol. 8, pp. 127- 139 ,(1976) , 10.1016/0031-3203(76)90014-5

Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data

来源期刊

我的账户

Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data

来源期刊

相似文章 10

我的账户