A Hierarchical Bayesian Mixture Model for Inferring the Expression State of Genes in Transcriptomes

作者: Ammon Thompson , Michael R. May , Brian R. Moore , Artyom Kopp

DOI: 10.1101/711630

关键词:

摘要: Transcriptomes are key to understanding the relationship between genotype and phenotype. The ability to infer the expression state (active or inactive) of genes in the transcriptome offers unique benefits for addressing this issue. For example, qualitative changes in gene expression may underly the origin of novel phenotypes, and expression states are readily comparable between tissues and species. However, inferring the expression state of genes is a surprisingly difficult problem, owing to the complex biological and technical processes that give rise to observed transcriptomic datasets. Here, we develop a hierarchical Bayesian mixture model that describes this complex process, and allows us to infer expression state of genes from replicate transcriptomic libraries. We explore the statistical behavior of this method with analyses of simulated datasets—where we demonstrate its ability to correctly infer true (known) expression states—and empirical-benchmark datasets, where we demonstrate that the expression states inferred from RNA-seq datasets using our method are consistent with those based on independent evidence. The power of our method to correctly infer expression states is generally high and, remarkably, approaches the maximum possible power for this inference problem. We present an empirical analysis of primate-brain transcriptomes, which identifies genes that have a unique expression state in humans. Our method is implemented in the freely-available R package zigzag.Significance StatementHow do the cells of an organism—each with an identical genome—give rise to tissues of incredible phenotypic diversity? Key to answering this question is the transcriptome: the set of genes expressed in a given tissue. We would clearly benefit from the ability to identify qualitative differences in expression (whether a gene is active or inactive in a given tissue/species). Inferring the expression state of genes is surprisingly difficult, owing to the complex biological processes that give rise to transcriptomes, and to the vagaries of techniques used to generate transcriptomic datasets. We develop a hierarchical Bayesian mixture model that—by describing those biological and technical processes—allows us to infer the expression state of genes from replicate transcriptomic datasets.

参考文章(41)
GTEx Consortium, Kristin G Ardlie, David S Deluca, Ayellet V Segrè, Timothy J Sullivan, Taylor R Young, Ellen T Gelfand, Casandra A Trowbridge, Julian B Maller, Taru Tukiainen, Monkol Lek, Lucas D Ward, Pouya Kheradpour, Benjamin Iriarte, Yan Meng, Cameron D Palmer, Tõnu Esko, Wendy Winckler, Joel N Hirschhorn, Manolis Kellis, Daniel G MacArthur, Gad Getz, Andrey A Shabalin, Gen Li, Yi-Hui Zhou, Andrew B Nobel, Ivan Rusyn, Fred A Wright, Tuuli Lappalainen, Pedro G Ferreira, Halit Ongen, Manuel A Rivas, Alexis Battle, Sara Mostafavi, Jean Monlong, Michael Sammeth, Marta Mele, Ferran Reverter, Jakob M Goldmann, Daphne Koller, Roderic Guigó, Mark I McCarthy, Emmanouil T Dermitzakis, Eric R Gamazon, Hae Kyung Im, Anuar Konkashbaev, Dan L Nicolae, Nancy J Cox, Timothée Flutre, Xiaoquan Wen, Matthew Stephens, Jonathan K Pritchard, Zhidong Tu, Bin Zhang, Tao Huang, Quan Long, Luan Lin, Jialiang Yang, Jun Zhu, Jun Liu, Amanda Brown, Bernadette Mestichelli, Denee Tidwell, Edmund Lo, Mike Salvatore, Saboor Shad, Jeffrey A Thomas, John T Lonsdale, Michael T Moser, Bryan M Gillard, Ellen Karasik, Kimberly Ramsey, Christopher Choi, Barbara A Foster, John Syron, Johnell Fleming, Harold Magazine, Rick Hasz, Gary D Walters, Jason P Bridge, Mark Miklos, Susan Sullivan, Laura K Barker, Heather M Traino, Maghboeba Mosavel, Laura A Siminoff, Dana R Valley, Daniel C Rohrer, Scott D Jewell, Philip A Branton, Leslie H Sobin, Mary Barcus, Liqun Qi, Jeffrey McLean, Pushpa Hariharan, Ki Sung Um, Shenpei Wu, David Tabor, Charles Shive, Anna M Smith, Stephen A Buia, Anita H Undale, Karna L Robinson, Nancy Roche, Kimberly M Valentino, Angela Britton, Robin Burges, Debra Bradbury, Kenneth W Hambright, John Seleski, Greg E Korzeniewski, Kenyon Erickson, Yvonne Marcus, Jorge Tejada, Mehran Taherian, Chunrong Lu, Margaret Basile, Deborah C Mash, Simona Volpi, Jeffery P Struewing, Gary F Temple, Joy Boyer, Deborah Colantuoni, Roger Little, Susan Koester, Latarsha J Carithers, Helen M Moore, Ping Guan, Carolyn Compton, Sherilyn J Sawyer, Joanne P Demchok, Jimmie B Vaught, Chana A Rabiner, Nicole C Lockhart, Kristin G Ardlie, Gad Getz, Fred A Wright, Manolis Kellis, Simona Volpi, Emmanouil T Dermitzakis, None, The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans Science. ,vol. 348, pp. 648- 660 ,(2015) , 10.1126/SCIENCE.1262110
Marta Melé, Pedro G Ferreira, Ferran Reverter, David S DeLuca, Jean Monlong, Michael Sammeth, Taylor R Young, Jakob M Goldmann, Dmitri D Pervouchine, Timothy J Sullivan, Rory Johnson, Ayellet V Segrè, Sarah Djebali, Anastasia Niarchou, The GTEx Consortium, Fred A Wright, Tuuli Lappalainen, Miquel Calvo, Gad Getz, Emmanouil T Dermitzakis, Kristin G Ardlie, Roderic Guigó, None, The human transcriptome across tissues and individuals Science. ,vol. 348, pp. 660- 665 ,(2015) , 10.1126/SCIENCE.AAA0355
Z. Khan, M. J. Ford, D. A. Cusanovich, A. Mitrano, J. K. Pritchard, Y. Gilad, Primate transcript and protein expression levels evolve under compensatory selection pressures. Science. ,vol. 342, pp. 1100- 1104 ,(2013) , 10.1126/SCIENCE.1242379
Christine Vogel, Edward M. Marcotte, Insights into the regulation of protein abundance from proteomic and transcriptomic analyses Nature Reviews Genetics. ,vol. 13, pp. 227- 232 ,(2012) , 10.1038/NRG3185
Kevin Struhl, Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Structural & Molecular Biology. ,vol. 14, pp. 103- 105 ,(2007) , 10.1038/NSMB0207-103
Tamar Geiger, Juergen Cox, Matthias Mann, None, Proteomic changes resulting from gene copy number variations in cancer cells. PLOS Genetics. ,vol. 6, ,(2010) , 10.1371/JOURNAL.PGEN.1001090
Traver Hart, H Komori, Sarah LaMere, Katie Podshivalova, Daniel R Salomon, Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics. ,vol. 14, pp. 778- 778 ,(2013) , 10.1186/1471-2164-14-778
S. R. Piccolo, M. R. Withers, O. E. Francis, A. H. Bild, W. E. Johnson, Multiplatform single-sample estimates of transcriptional activation Proceedings of the National Academy of Sciences of the United States of America. ,vol. 110, pp. 17778- 17783 ,(2013) , 10.1073/PNAS.1305823110
Samuel Marguerat, Alexander Schmidt, Sandra Codlin, Wei Chen, Ruedi Aebersold, Jürg Bähler, Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells Cell. ,vol. 151, pp. 671- 683 ,(2012) , 10.1016/J.CELL.2012.09.019
Harm van Bakel, Corey Nislow, Benjamin J. Blencowe, Timothy R. Hughes, Most “Dark Matter” Transcripts Are Associated With Known Genes PLoS Biology. ,vol. 8, pp. e1000371- ,(2010) , 10.1371/JOURNAL.PBIO.1000371