Biological Averaging in RNA-Seq

作者: Jeffery L. Dangl , Tatiana S. Mucyn , Surojit Biswas , Yash N. Agrawal , Corbin D. Jones

DOI:

关键词: StatisticsSample size determinationClassifier (UML)Exploratory analysisCost efficiencyEfficiencyRNA-SeqRegulomeBiological variationBiology

摘要: RNA-seq has become a de facto standard for measuring gene expression. Traditionally, experiments are mathematically averaged -- they sequence the mRNA of individuals from different treatment groups, hoping to correlate phenotype with differences in arithmetic read count averages at shared loci interest. Alternatively, tissue same may be pooled prior sequencing what we refer as biologically design. As mathematical averaging sequences all it controls both biological and technical variation; however, is statistical resolution gained always worth additional cost? To compare averaging, examined theoretical empirical estimates efficiency relative cost efficiency. Though less efficient fixed sample size, found that can more than averaging. With this motivation, developed differential expression classifier, ICRBC, detect alternatively expressed genes between samples. In simulation studies, subsequent analysis our classifier performed comparably existing methods, such ASC, edgeR, DESeq, especially when were evenly 20% regulome was expected differentially regulated. two technically distinct mouse datasets one plant dataset, method over 87% concordant edgeR 100 most significant features. We therefore conclude sufficiently control variation level detectable. situations, ICRBC enable reliable exploratory fraction cost, interest lies loci.

参考文章(23)
Joseph M. Hilbe, Negative Binomial Regression ,(2007)
Jessica W. Greenwald, Charles J. Greenwald, Benjamin J. Philmus, Tadhg P. Begley, Dennis C. Gross, RNA-seq Analysis Reveals That an ECF σ Factor, AcsS, Regulates Achromobactin Biosynthesis in Pseudomonas syringae pv. syringae B728a PLoS ONE. ,vol. 7, pp. e34804- ,(2012) , 10.1371/JOURNAL.PONE.0034804
Zhong Wang, Mark Gerstein, Michael Snyder, RNA-Seq: a revolutionary tool for transcriptomics Nature Reviews Genetics. ,vol. 10, pp. 57- 63 ,(2009) , 10.1038/NRG2484
Jay Shendure, The Beginning of the End for Microarrays Nature Methods. ,vol. 5, pp. 585- 587 ,(2008) , 10.1038/NMETH0708-585
Paul L. Auer, R. W. Doerge, Statistical Design and Analysis of RNA Sequencing Data Genetics. ,vol. 185, pp. 405- 416 ,(2010) , 10.1534/GENETICS.110.114983
Gordon K Smyth, Terry Speed, None, Normalization of cDNA microarray data. Methods. ,vol. 31, pp. 265- 273 ,(2003) , 10.1016/S1046-2023(03)00155-5
William S. Cleveland, Susan J. Devlin, Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting Journal of the American Statistical Association. ,vol. 83, pp. 596- 610 ,(1988) , 10.1080/01621459.1988.10478639
C. Kendziorski, R. A. Irizarry, K.-S. Chen, J. D. Haag, M. N. Gould, On the utility of pooling biological samples in microarray experiments. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 102, pp. 4252- 4257 ,(2005) , 10.1073/PNAS.0500607102
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Jason S. Cumbie, Jeffrey A. Kimbrel, Yanming Di, Daniel W. Schafer, Larry J. Wilhelm, Samuel E. Fox, Christopher M. Sullivan, Aron D. Curzon, James C. Carrington, Todd C. Mockler, Jeff H. Chang, GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences PLoS ONE. ,vol. 6, pp. e25279- ,(2011) , 10.1371/JOURNAL.PONE.0025279