Parallelized prediction error estimation for evaluation of high-dimensional models

作者: Christine Porzelius , Harald Binder , Martin Schumacher

DOI: 10.1093/BIOINFORMATICS/BTP062

关键词: Computer clusterEstimationMean squared prediction errorAdvice (programming)ResamplingHigh dimensionalData miningComputer scienceInterface (computing)

摘要: Summary: There is a multitude of new techniques that promise to extract predictive information in bioinformatics applications. It has been recognized first step for validation the resulting model fits should rely on proper use resampling techniques. However, this advice frequently not followed, potential reasons being difficulty correct implementation and computational demand. This addressed by R package peperr, which designed reliable prediction error estimation through resampling, potentially accelerated parallel execution compute cluster. Its interface allows easy connection newly developed fitting routines. Performance evaluation latter furthermore guided diagnostic plots, helps detect specific problems due high-dimensional data structures. Availability: http://cran.r-project.org, http://www.imbi.uni-freiburg.de/parallel Contact: cp@fdm.uni-freiburg.de Supplementary information:Supplementary are available at Bioinformatics online.

参考文章(14)
Jochen Knaus, Christine Porzelius, Harald Binder, Guido Schwarzer, Easier parallel computing in R with snowfall and sfCluster R Journal. ,vol. 1, pp. 54- 59 ,(2009) , 10.32614/RJ-2009-004
Thomas A. Gerds, Martin Schumacher, Efron-Type Measures of Prediction Error for Survival Analysis Biometrics. ,vol. 63, pp. 1283- 1287 ,(2007) , 10.1111/J.1541-0420.2007.00832.X
A. J Rossini, Luke Tierney, Na Li, Simple Parallel Statistical Computing in R Journal of Computational and Graphical Statistics. ,vol. 16, pp. 399- 420 ,(2007) , 10.1198/106186007X178979
Harald Binder, Martin Schumacher, Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models BMC Bioinformatics. ,vol. 9, pp. 14- 14 ,(2008) , 10.1186/1471-2105-9-14
Bradley Efron, Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation Journal of the American Statistical Association. ,vol. 78, pp. 316- 331 ,(1983) , 10.1080/01621459.1983.10477973
R. Simon, M. D. Radmacher, K. Dobbin, L. M. McShane, Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification Journal of the National Cancer Institute. ,vol. 95, pp. 14- 18 ,(2003) , 10.1093/JNCI/95.1.14
Alain Dupuy, Richard M. Simon, Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Journal of the National Cancer Institute. ,vol. 99, pp. 147- 157 ,(2007) , 10.1093/JNCI/DJK018
Bradley Efron, Robert Tibshirani, Improvements on Cross-Validation: The 632+ Bootstrap Method Journal of the American Statistical Association. ,vol. 92, pp. 548- 560 ,(1997) , 10.1080/01621459.1997.10474007