Identifying stably expressed genes from multiple RNA-Seq data sets

作者: Bin Zhuo , Sarah Emerson , Jeff H. Chang , Yanming Di

DOI: 10.7717/PEERJ.2791

关键词: GeneComputational biologyRNA-SeqNumerical stabilityGeneticsNormalization (statistics)InterpretabilityGene expressionMathematicsTotal variationPoisson distribution

摘要: We examined RNA-Seq data on 211 biological samples from 24 different Arabidopsis experiments carried out by labs. grouped the according to tissue types, and in each of groups, we identified genes that are stably expressed across samples, treatment conditions, experiments. fit a Poisson log-linear mixed-effect model read counts for gene decomposed total variance into between-sample, between-treatment between-experiment components. Identifying is useful count normalization differential expression analysis. The component analysis explore here first step towards understanding sources nature variation. When using numerical measure identify genes, outcome depends multiple factors: background sample set reference used normalization, technology measuring expression, specific stability used. Since (DE) measured relative frequencies, argue DE concept. advocate an explicit improve interpretability results, recommend common when analyzing avoid potential inconsistent conclusions.

参考文章(35)
Yanming Di, Daniel W Schafer, Jason S Cumbie, Jeff H Chang, The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq Statistical Applications in Genetics and Molecular Biology. ,vol. 10, pp. 1- 28 ,(2011) , 10.2202/1544-6115.1637
David G Clayton, Generalized Linear Mixed Models Encyclopedia of Biostatistics. pp. 845- 852 ,(2003) , 10.1002/9781118445112.STAT07540
Jo Vandesompele, Katleen De Preter, Filip Pattyn, Bruce Poppe, Nadine Van Roy, Anne De Paepe, Frank Speleman, Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes Genome Biology. ,vol. 3, pp. 1- 12 ,(2002) , 10.1186/GB-2002-3-7-RESEARCH0034
Douglas Bates, Martin Mächler, Ben Bolker, Steve Walker, Fitting Linear Mixed-Effects Models Using lme4 Journal of Statistical Software. ,vol. 67, pp. 1- 48 ,(2015) , 10.18637/JSS.V067.I01
Bala Gur-Dedeoglu, Ozlen Konu, Betul Bozkurt, Gulusan Ergul, Selda Seckin, Isik G. Yulug, Identification of endogenous reference genes for qRT-PCR analysis in normal matched breast tumor tissues. Oncology Research. ,vol. 17, pp. 353- 365 ,(2009) , 10.3727/096504009788428460
Simon Anders, Davis J McCarthy, Yunshun Chen, Michal Okoniewski, Gordon K Smyth, Wolfgang Huber, Mark D Robinson, Count-based differential expression analysis of RNA sequencing data using R and Bioconductor Nature Protocols. ,vol. 8, pp. 1765- 1786 ,(2013) , 10.1038/NPROT.2013.099
Xiaoxue Wang, Fangming Wu, Qiguang Xie, Huamei Wang, Ying Wang, Yanling Yue, Ondrej Gahura, Shuangshuang Ma, Lei Liu, Ying Cao, Yuling Jiao, Frantisek Puta, C. Robertson McClung, Xiaodong Xu, Ligeng Ma, SKIP Is a Component of the Spliceosome Linking Alternative Splicing and the Circadian Clock in Arabidopsis The Plant Cell. ,vol. 24, pp. 3278- 3295 ,(2012) , 10.1105/TPC.112.100081
Davide Risso, John Ngai, Terence P Speed, Sandrine Dudoit, Normalization of RNA-seq data using factor analysis of control genes or samples Nature Biotechnology. ,vol. 32, pp. 896- 902 ,(2014) , 10.1038/NBT.2931
Detlef Weigel, Richard Mott, The 1001 Genomes Project for Arabidopsis thaliana Genome Biology. ,vol. 10, pp. 107- 107 ,(2009) , 10.1186/GB-2009-10-5-107
Markus Frericks, Charlotte Esser, A toolbox of novel murine house-keeping genes identified by meta-analysis of large scale gene expression profiles Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. ,vol. 1779, pp. 830- 837 ,(2008) , 10.1016/J.BBAGRM.2008.08.007