作者: Thasso Griebel , Benedikt Zacher , Paolo Ribeca , Emanuele Raineri , Vincent Lacroix
DOI: 10.1093/NAR/GKS666
关键词:
摘要: High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed molecule. Nature, impact and mutual interference biases in different experimental setups are, however, still poorly understood-mostly due to the lack data intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving sample preparation protocols platforms: we broke them down into their common-and currently indispensable-technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation sequencing), investigating how such steps influence abundance distribution sequenced reads. For each those steps, developed universally applicable models, which can be parameterised by empirical attributes any protocol. Our models are implemented computer simulation pipeline called Flux Simulator, show that read distributions generated combinations these reproduce well corresponding evidence obtained setups. further demonstrate our silico insights about hidden precursors determine final configuration reads along gene bodies; enhancing or compensatory effects explain apparently controversial observations observed. Moreover, simulations identify hitherto unreported sources systematic bias hydrolysis, fragmentation technique employed most protocols.