作者: Nicholas T. Cockroft , Xiaolin Cheng , James R. Fuchs
关键词:
摘要: Target fishing is the process of identifying protein target a bioactive small molecule. To do so experimentally requires significant investment time and resources, which can be expedited with reliable computational model. The development models using machine learning has become very popular over last several years because increased availability large amounts public bioactivity data. Unfortunately, applicability performance such for natural products not yet been comprehensively assessed. This is, in part, due to relative lack data available compared synthetic compounds. Moreover, databases commonly used train annotate compounds are products, makes collection benchmarking set difficult. address this knowledge gap, composed product structures their associated targets was generated by cross-referencing 20 publicly database ChEMBL. contains 5589 compound-target pairs 1943 unique 1023 targets. A comprising 107 190 88 728 1907 k-nearest neighbors, random forest, multilayer perceptron models. predictive each model assessed stratified 10-fold cross-validation on newly collected set. Strong observed during area under receiver operating characteristic (AUROC) scores ranging from 0.94 0.99 Boltzmann-enhanced discrimination (BEDROC) 0.89 0.94. When tested set, dramatically decreased AUROC 0.70 0.85 BEDROC 0.43 0.59. However, implementation stacking approach, uses logistic regression as meta-classifier combine predictions, improved ability correctly predict score 0.73. stacked deployed web application, called STarFish, made use aid identification products.