STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products

作者: Nicholas T. Cockroft , Xiaolin Cheng , James R. Fuchs

DOI: 10.1021/ACS.JCIM.9B00489

关键词:

摘要: Target fishing is the process of identifying protein target a bioactive small molecule. To do so experimentally requires significant investment time and resources, which can be expedited with reliable computational model. The development models using machine learning has become very popular over last several years because increased availability large amounts public bioactivity data. Unfortunately, applicability performance such for natural products not yet been comprehensively assessed. This is, in part, due to relative lack data available compared synthetic compounds. Moreover, databases commonly used train annotate compounds are products, makes collection benchmarking set difficult. address this knowledge gap, composed product structures their associated targets was generated by cross-referencing 20 publicly database ChEMBL. contains 5589 compound-target pairs 1943 unique 1023 targets. A comprising 107 190 88 728 1907 k-nearest neighbors, random forest, multilayer perceptron models. predictive each model assessed stratified 10-fold cross-validation on newly collected set. Strong observed during area under receiver operating characteristic (AUROC) scores ranging from 0.94 0.99 Boltzmann-enhanced discrimination (BEDROC) 0.89 0.94. When tested set, dramatically decreased AUROC 0.70 0.85 BEDROC 0.43 0.59. However, implementation stacking approach, uses logistic regression as meta-classifier combine predictions, improved ability correctly predict score 0.73. stacked deployed web application, called STarFish, made use aid identification products.

参考文章(54)
Avid M Afzal, Hamse Y Mussa, Richard E Turner, Andreas Bender, Robert C Glen, A multi-label approach to target prediction taking ligand promiscuity into account Journal of Cheminformatics. ,vol. 7, pp. 24- 24 ,(2015) , 10.1186/S13321-015-0071-9
Rowan Hatherley, David K Brown, Thommas M Musyoka, David L Penkler, Ngonidzashe Faya, Kevin A Lobb, Özlem Tastan Bishop, SANCDB: a South African natural compound database Journal of Cheminformatics. ,vol. 7, pp. 29- 29 ,(2015) , 10.1186/S13321-015-0080-8
John J. Irwin, Teague Sterling, ZINC 15 – Ligand Discovery for Everyone Journal of Chemical Information and Modeling. ,vol. 55, pp. 2324- 2337 ,(2015) , 10.1021/ACS.JCIM.5B00559
Fidele Ntie-Kang, Denis Zofou, Smith B Babiaka, Rolande Meudom, Michael Scharfe, Lydia L Lifongo, James A Mbah, Luc Meva’a Mbaze, Wolfgang Sippl, Simon MN Efange, None, AfroDb: A Select Highly Potent and Diverse Natural Product Library from African Medicinal Plants PLoS ONE. ,vol. 8, pp. e78085- 15 ,(2013) , 10.1371/JOURNAL.PONE.0078085
R. Burbidge, M. Trotter, B. Buxton, S. Holden, Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computational Biology and Chemistry. ,vol. 26, pp. 5- 14 ,(2001) , 10.1016/S0097-8485(01)00094-8
Xiaoyang Xia, Edward G. Maliski, Paul Gallant, David Rogers, Classification of kinase inhibitors using a Bayesian model. Journal of Medicinal Chemistry. ,vol. 47, pp. 4463- 4470 ,(2004) , 10.1021/JM0303195
David Rogers, Mathew Hahn, Extended-Connectivity Fingerprints Journal of Chemical Information and Modeling. ,vol. 50, pp. 742- 754 ,(2010) , 10.1021/CI100050T
Vladimir Svetnik, Andy Liaw, Christopher Tong, J. Christopher Culberson, Robert P. Sheridan, Bradley P. Feuston, Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences. ,vol. 43, pp. 1947- 1958 ,(2003) , 10.1021/CI034160G
Jiangyong Gu, Yuanshen Gui, Lirong Chen, Gu Yuan, Hui-Zhe Lu, Xiaojie Xu, Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology PLoS ONE. ,vol. 8, pp. e62839- ,(2013) , 10.1371/JOURNAL.PONE.0062839