作者: Stephen R. Piccolo , Nathan P. Golightly , Dustin B. Miller , Avery Mecham , Jérémie L. Johnson
DOI: 10.1101/2021.05.07.442940
关键词:
摘要: By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include with particular disease subtype, good (or poor) prognosis, or most least) likely to respond therapy. Diverse types of biomarkers have been proposed assigning subgroups. For example, DNA variants in tumors show promise as biomarkers; however, exhibit considerable genomic heterogeneity. As an alternative, transcriptomic measurements reflect the downstream effects and epigenomic variations. However, high-throughput technologies generate thousands per patient, complex dependencies exist among genes, so it may be infeasible classify traditional statistical models. Machine-learning classification algorithms help this problem. hundreds exist, support diverse hyperparameters, is difficult researchers know which are optimal gene-expression biomarkers. We performed benchmark comparison, applying 50 datasets (143 class variables). evaluated that represent machine-learning methodologies implemented general-purpose, open-source, libraries. When available, we combined clinical predictors data. Additionally, performing hyperparameter optimization feature selection nested cross-validation folds. Kernel- ensemble-based consistently outperformed other algorithms; even top-performing poorly some cases. Hyperparameter typically improved predictive performance, univariate feature-selection sophisticated methods. Together, our findings illustrate algorithm performance varies considerably when factors held constant thus critical step biomarker studies.