Dimension reduction strategies for analyzing global gene expression data with a response.

作者: Francesca Chiaromonte , Jessica Martinelli

DOI: 10.1016/S0025-5564(01)00106-7

关键词: Dimensionality reductionCategorical variableSliced inverse regressionMathematicsRegression analysisSufficient dimension reductionRegressionExpression (mathematics)Linear combinationData mining

摘要: The analysis of global gene expression data from microarrays is breaking new ground in genetics research, while confronting modelers and statisticians with many critical issues. In this paper, we consider sets which a categorical or continuous response recorded, along expression, on given number experimental samples. Data type are usually employed to create prediction mechanism for the based identify subset relevant genes. This defines regression setting characterized by dramatic under-resolution respect predictors (genes), whose exceeds orders magnitude available observations (samples). We present dimension reduction strategy that, under appropriate assumptions, allows us restrict attention few linear combinations original profiles, thus overcome under-resolution. These can then be used build validate model standard techniques. Moreover, they rank predictors, ultimately select them through comparison background 'chance scenario' independent randomizations. apply publicly leukemia classification.

参考文章(12)
Francesca Chiaromonte, R. Dennis Cook, Sufficient dimension reduction and graphics in regression Annals of the Institute of Statistical Mathematics. ,vol. 54, pp. 768- 795 ,(2002) , 10.1023/A:1022411301790
Trevor Hastie, Robert Tibshirani, Michael B Eisen, Ash Alizadeh, Ronald Levy, Louis Staudt, Wing C Chan, David Botstein, Patrick Brown, 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns Genome Biology. ,vol. 1, pp. 1- 21 ,(2000) , 10.1186/GB-2000-1-2-RESEARCH0003
Sandrine Dudoit, Jane Fridlyand, Terence P Speed, None, Comparison of discrimination methods for the classification of tumors using gene expression data Journal of the American Statistical Association. ,vol. 97, pp. 77- 87 ,(2002) , 10.1198/016214502753479248
Francesca Chiaromonte, R.Dennis Cook, Bing Li, Optimal sufficient dimension reduction in regressions with categorical predictors Annals of Statistics. ,vol. 30, pp. 475- 497 ,(2002) , 10.1214/AOS/1021379862
Santiago Velilla, Assessing the Number of Linear Components in a General Regression Problem Journal of the American Statistical Association. ,vol. 93, pp. 1088- 1098 ,(1998) , 10.1080/01621459.1998.10473770
James R. Schott, Determining the Dimensionality in Sliced Inverse Regression Journal of the American Statistical Association. ,vol. 89, pp. 141- 148 ,(1994) , 10.1080/01621459.1994.10476455
Charles M. Perou, Therese Sørlie, Michael B. Eisen, Matt van de Rijn, Stefanie S. Jeffrey, Christian A. Rees, Jonathan R. Pollack, Douglas T. Ross, Hilde Johnsen, Lars A. Akslen, Øystein Fluge, Alexander Pergamenschikov, Cheryl Williams, Shirley X. Zhu, Per E. Lønning, Anne-Lise Børresen-Dale, Patrick O. Brown, David Botstein, Molecular portraits of human breast tumours Nature. ,vol. 406, pp. 747- 752 ,(2000) , 10.1038/35021093
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, E. S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. ,vol. 286, pp. 531- 537 ,(1999) , 10.1126/SCIENCE.286.5439.531
N. S. Holter, M. Mitra, A. Maritan, M. Cieplak, J. R. Banavar, N. V. Fedoroff, Fundamental patterns underlying gene expression profiles: Simplicity from complexity Proceedings of the National Academy of Sciences of the United States of America. ,vol. 97, pp. 8409- 8414 ,(2000) , 10.1073/PNAS.150242097