作者: Jeffery L. Dangl , Tatiana S. Mucyn , Surojit Biswas , Yash N. Agrawal , Corbin D. Jones
DOI:
关键词: Statistics 、 Sample size determination 、 Classifier (UML) 、 Exploratory analysis 、 Cost efficiency 、 Efficiency 、 RNA-Seq 、 Regulome 、 Biological variation 、 Biology
摘要: RNA-seq has become a de facto standard for measuring gene expression. Traditionally, experiments are mathematically averaged -- they sequence the mRNA of individuals from different treatment groups, hoping to correlate phenotype with differences in arithmetic read count averages at shared loci interest. Alternatively, tissue same may be pooled prior sequencing what we refer as biologically design. As mathematical averaging sequences all it controls both biological and technical variation; however, is statistical resolution gained always worth additional cost? To compare averaging, examined theoretical empirical estimates efficiency relative cost efficiency. Though less efficient fixed sample size, found that can more than averaging. With this motivation, developed differential expression classifier, ICRBC, detect alternatively expressed genes between samples. In simulation studies, subsequent analysis our classifier performed comparably existing methods, such ASC, edgeR, DESeq, especially when were evenly 20% regulome was expected differentially regulated. two technically distinct mouse datasets one plant dataset, method over 87% concordant edgeR 100 most significant features. We therefore conclude sufficiently control variation level detectable. situations, ICRBC enable reliable exploratory fraction cost, interest lies loci.