Uninformative variable elimination assisted by Gram–Schmidt Orthogonalization/successive projection algorithm for descriptor selection in QSAR

作者: Nematollah Omidikia , Mohsen Kompany-Zareh

DOI: 10.1016/J.CHEMOLAB.2013.07.008

关键词:

摘要: Abstract Employment of Uninformative Variable Elimination (UVE) as a robust variable selection method is reported in this study. Each regression coefficient represents the contribution corresponding established model, but presence uninformative variables well collinearity reliability coefficient's magnitude suspicious. Successive Projection Algorithm (SPA) and Gram–Schmidt Orthogonalization (GSO) were implemented pre-selection technique for removing redundancy among model. elimination-partial least squares (UVE-PLS) was performed on pre-selected data set C value 's calculated each descriptor. In case UVE assisted by SPA or GSO could be used order to rank according their importance. Leave-many-out cross-validation (LMO-CV) applied ordered descriptors selecting optimal number descriptors. Selwood including 31 molecules 53 descriptors, anti-HIV 107 160 utilized When set, obtained results desired not only prediction ability constructed model also selected informative By applying GSO-UVE-PLS data, an optimized condition, seven out with q 2  = 0.769 R  = 0.915. Also SPA-UVE-PLS nine  = 0.81,  = 0.84 Q F3  = 0.8.

参考文章(53)
Viviana Consonni, Davide Ballabio, Roberto Todeschini, Comments on the Definition of the Q2 Parameter for QSAR Validation Journal of Chemical Information and Modeling. ,vol. 49, pp. 1669- 1678 ,(2009) , 10.1021/CI900115Y
Qing-Juan Han, Hai-Long Wu, Chen-Bo Cai, Lu Xu, Ru-Qin Yu, An ensemble of Monte Carlo uninformative variable elimination for wavelength selection Analytica Chimica Acta. ,vol. 612, pp. 121- 125 ,(2008) , 10.1016/J.ACA.2008.02.032
H. Bazoui, M. Zahouily, S. Boulajaaj, S. Sebti, D. Zakarya, QSAR for anti-HIV activity of HEPT derivatives Sar and Qsar in Environmental Research. ,vol. 13, pp. 567- 577 ,(2002) , 10.1080/1062936021000020035
N.M. Faber, M.J. Meinders, P. Geladi, M. Sjöström, L.M.C. Buydens, G. Kateman, Random error bias in principal component analysis. Part I. derivation of theoretical predictions Analytica Chimica Acta. ,vol. 304, pp. 257- 271 ,(1995) , 10.1016/0003-2670(94)00585-A
K. Bodzioch, A. Durand, R. Kaliszan, T. Bączek, Y. Vander Heyden, Advanced QSRR modeling of peptides behavior in RPLC. Talanta. ,vol. 81, pp. 1711- 1718 ,(2010) , 10.1016/J.TALANTA.2010.03.028
Mário César Ugulino Araújo, Teresa Cristina Bezerra Saldanha, Roberto Kawakami Harrop Galvão, Takashi Yoneyama, Henrique Caldas Chame, Valeria Visani, The successive projections algorithm for variable selection in spectroscopic multicomponent analysis Chemometrics and Intelligent Laboratory Systems. ,vol. 57, pp. 65- 73 ,(2001) , 10.1016/S0169-7439(01)00119-8
Andrew G Mercader, Pablo R Duchowicz, Francisco M Fernández, Eduardo A Castro, None, Advances in the replacement and enhanced replacement method in QSAR and QSPR theories. Journal of Chemical Information and Modeling. ,vol. 51, pp. 1575- 1581 ,(2011) , 10.1021/CI200079B
Chris L. Waller, Mary P. Bradley, Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship Studies Journal of Chemical Information and Computer Sciences. ,vol. 39, pp. 345- 355 ,(1999) , 10.1021/CI980405R
Xiaojing Chen, Han Li, Di Wu, Xinxiang Lei, Xiangou Zhu, Anjiang Zhang, Application of a hybrid variable selection method for the classification of rapeseed oils based on 1H NMR spectral analysis European Food Research and Technology. ,vol. 230, pp. 981- 988 ,(2010) , 10.1007/S00217-010-1241-7
F. Westad, H. Martens, Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression Journal of Near Infrared Spectroscopy. ,vol. 8, pp. 117- 124 ,(2000) , 10.1255/JNIRS.271