Discriminative feature of cells characterizes cell populations of interest by a small subset of genes

作者: Yasuyuki Ohkawa , Kazumitsu Maehara , Masatoshi Fujita , Takeru Fujii

DOI: 10.1101/2021.03.12.435089

关键词:

摘要: Statistical methods for detecting differences in individual gene expression are indispensable understanding cell types. However, conventional statistical have faced difficulties associated with the inflation of P-values because both large sample size and selection bias introduced by exploratory data analysis such as single-cell transcriptomics. Here, we propose concept discriminative feature cells (DFC), an alternative to using differentially expressed gene-based approaches. We implemented DFC logistic regression adaptive LASSO penalty perform binary classification discrimination a population interest variable obtain small subset defining genes. demonstrated that prioritized pairs non-independent artificial data, enabled characterize muscle satellite population. The results revealed well captured cell-type-specific markers, specific patterns, subcategories this may complement interpreting sets.

参考文章(46)
Tamás Nepusz, Gábor Csárdi, The igraph software package for complex network research InterJournal Complex Systems. ,vol. 1695, ,(2006)
J. D. Storey, R. Tibshirani, Statistical significance for genomewide studies Proceedings of the National Academy of Sciences of the United States of America. ,vol. 100, pp. 9440- 9445 ,(2003) , 10.1073/PNAS.1530509100
Simon Fortier, Tara MacRae, Mélanie Bilodeau, Tobias Sargeant, Guy Sauvageau, Haploinsufficiency screen highlights two distinct groups of ribosomal protein genes essential for embryonic stem cell fate Proceedings of the National Academy of Sciences of the United States of America. ,vol. 112, pp. 2127- 2132 ,(2015) , 10.1073/PNAS.1418845112
Edyta Wróbel, Edyta Brzóska, Jerzy Moraczewski, M-cadherin and β-catenin participate in differentiation of rat satellite cells European Journal of Cell Biology. ,vol. 86, pp. 99- 109 ,(2007) , 10.1016/J.EJCB.2006.11.004
Hui Zou, The adaptive lasso and its oracle properties Journal of the American Statistical Association. ,vol. 101, pp. 1418- 1429 ,(2006) , 10.1198/016214506000000735
Houtao Deng, George Runger, Gene selection with guided regularized random forest Pattern Recognition. ,vol. 46, pp. 3483- 3489 ,(2013) , 10.1016/J.PATCOG.2013.05.018
Bradley Efron, Robert Tibshirani, John D Storey, Virginia Tusher, Empirical Bayes analysis of a microarray experiment Journal of the American Statistical Association. ,vol. 96, pp. 1151- 1160 ,(2001) , 10.1198/016214501753382129
Jerome Friedman, Trevor Hastie, Robert Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent Journal of Statistical Software. ,vol. 33, pp. 1- 22 ,(2010) , 10.18637/JSS.V033.I01
Yoav Benjamini, Yosef Hochberg, Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 57, pp. 289- 300 ,(1995) , 10.1111/J.2517-6161.1995.TB02031.X
Mark D Robinson, Davis J McCarthy, Gordon K Smyth, None, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. ,vol. 26, pp. 139- 140 ,(2010) , 10.1093/BIOINFORMATICS/BTP616