作者: Catherine Tuglus , Mark J van der Laan
DOI:
关键词:
摘要: More often than not biomarker studies analyze large quantities of variables with complicated and generally unknown correlation structure. There are numerous statistical methods which attempt to unravel these determine the underlying mechanism through identification causally related biomarkers. Results from difficult interpret nearly impossible compare across studies. The FDA has currently called for a standardization protocol detection. In response, we propose targeted variable importance (tVIM) as standardized method discovery. Through use Maximum Likelihood, tVIM provides double robust estimates along formal inference. These measures biologically interpretable causal effect under specified conditions, allowing reproducibility populations. this analysis four different provided by three methods: univariate linear regression (LM), LASSO penalized multiple (Q), two randomForest (RF1 RF2). Their performance is compared in simulation conditions increasing correlation. We interested their ability distinguish “true” relevant biomarkers correlated decoy comparisons based on resulting ranked list each using p-values when available. simulation, coupled data-adaptive model selection outperforms regression, LASSO, more resilient increases application apply all Golub et al 1999 Leukemia data gene lists biological relevance. Both LM also applied van’t Veer breast cancer data. them top 10 most important genes. From results, appears rank genes at its other methods. Given extreme correlations, reduce bias provide realistic discussed.