作者: IN SOCK JANG , ELIAS CHAIBUB NETO , JUSTIN GUINNEY , STEPHEN H. FRIEND , ADAM A. MARGOLIN
DOI: 10.1142/9789814583220_0007
关键词:
摘要: Large-scale pharmacogenomic screens of cancer cell lines have emerged as an attractive pre-clinical system for identifying tumor genetic subtypes with selective sensitivity to targeted therapeutic strategies. Application modern machine learning approaches datasets demonstrated the ability infer genomic predictors compound sensitivity. Such modeling entail many analytical design choices; however, a systematic study evaluating relative performance attributable each choice is not yet available. In this work, we evaluated over 110,000 different models, based on multifactorial experimental testing combinations factors within several categories choices, including: type algorithm, molecular feature data, being predicted, method summarizing values, and whether predictions are discretized or continuous response values. Our results suggest that model input data (type features compound) primary explaining performance, followed by algorithm. also provide statistically principled set recommended guidelines, using elastic net ridge regression from all profiling platforms, most importantly, gene expression features, predict continuous-valued scores summarized area under dose curve, pathway compounds likely yield accurate predictors. addition, our provides publicly available resource results, open source code base, researchers throughout community build assess novel methodologies applications in related predictive problems.