Complete validation for classification and class modeling procedures with selection of variables and/or with additional computed variables

作者: M. Forina , P. Oliveri , M. Casale

DOI: 10.1016/J.CHEMOLAB.2010.04.011

关键词:

摘要: Abstract The evaluation of the predictive ability a model, is an essential moment all chemometrical techniques. So it must be performed very carefully. However, in case selection relevant variables (an step data sets with many, frequently thousands, variables) generally using available objects. In some recent classification and class modeling techniques, from original or selected Mahalanobis distances leverages centroids categories problem are computed, then added to variables. Also here computed consequence overestimate prediction ability, large when ratio between number objects that rather low, so variance-covariance matrix unstable. this paper correct validation procedures described for cases addition on estimates compared those obtained insufficient strategies.

参考文章(19)
M. Forina, M. Casale, P. Oliveri, Application of Chemometrics to Food Chemistry Reference Module in Chemistry, Molecular Sciences and Chemical Engineering#R##N#Comprehensive Chemometrics#R##N#Chemical and Biochemical Data Analysis. ,vol. 4, pp. 75- 128 ,(2009) , 10.1016/B978-044452701-1.00124-1
J. Smeyers-Verbeke, P. J. Lewi, Desire L. Massart, L. M. Buydens, B. G. Vandeginste, S. De Jong, Handbook of Chemometrics and Qualimetrics Elsevier. ,(1998)
R. Todeschini, D. Ballabio, V. Consonni, A. Mauri, M. Pavan, CAIMAN (Classification and Influence Matrix Analysis) : A new approach to the classification based on leverage-scaled functions Chemometrics and Intelligent Laboratory Systems. ,vol. 87, pp. 3- 17 ,(2007) , 10.1016/J.CHEMOLAB.2005.11.001
Ronald D. Snee, Validation of Regression Models: Methods and Examples Technometrics. ,vol. 19, pp. 415- 428 ,(1977) , 10.1080/00401706.1977.10489581
M. Forina, M. Casale, P. Oliveri, S. Lanteri, CAIMAN brothers: A family of powerful classification and class modeling techniques Chemometrics and Intelligent Laboratory Systems. ,vol. 96, pp. 239- 245 ,(2009) , 10.1016/J.CHEMOLAB.2009.02.006
Jure Zupan, Marjana Novič, Xinzhi Li, Johann Gasteiger, Classification of multicomponent analytical data of olive oils using different neural networks Analytica Chimica Acta. ,vol. 292, pp. 219- 234 ,(1994) , 10.1016/0003-2670(94)00085-9
R. W. Kennard, L. A. Stone, Computer Aided Design of Experiments Technometrics. ,vol. 11, pp. 137- 148 ,(1969) , 10.1080/00401706.1969.10490666
M.P. Derde, D.L. Massart, UNEQ: a disjoint modelling technique for pattern recognition based on normal distribution Analytica Chimica Acta. ,vol. 184, pp. 33- 51 ,(1986) , 10.1016/S0003-2670(00)86468-5
Michele Forina, Silvia Lanteri, Monica Casale, M. Concepción Cerrato Oliveros, A new algorithm for seriation and its use in similarity dendrograms Chemometrics and Intelligent Laboratory Systems. ,vol. 87, pp. 262- 274 ,(2007) , 10.1016/J.CHEMOLAB.2007.03.004