作者: R. Spang , F. Markowetz
关键词:
摘要: Objectives: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies strategies of adaptive model selection avoid overfitting in highdimensional spaces. Methods: introduce likelihood-based methods, trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature methods: filtering, shrinkage wrapper approaches. In small sample-size situations efficient methods data re-use are needed assess the predictive power a model. two issues using cross-validation: difference between in-loop out-of-loop selection, estimating parameters nested-loop cross-validation. Results: Gene does not reduce dimensionality Tuning enable selection. The bias is common pitfall performance evaluation. Model evaluation can be combined Conclusions: Classification microarrays prone overfitting. A rigorous unbiased assessment must.