作者: Justin Block Kinney
DOI:
关键词:
摘要: In the post-genomics era, DNA sequence itself is becoming a medium by which to probe biological phenomena. With advent of microarray technology, and ultrahigh- throughput sequencing more recently, large data sets are becoming standard products day-to-day research. Yet as software for analyzing such data proliferates, fundamental understanding how should be used to gain insight missing from literature. The focus this thesis on developing tools characterizing biophysical interactions underlying transcriptional regulation { ability cells to control which genes they transcribe mRNA, thus express protein. We begin by presenting basic principles analysis specically, each � accompanied (perhaps very noisy) measurement z biophysical functionality. A salient feature experiments produce such �z di�culty experimental noise priori. overcome this obstacle introducing error-model-averaged (EMA) likelihood, allows biophysical models arbitrary functional form rigorously t data. EMA likelihood closely related mutual information, but its probabilistic interpretation provides some advantages. demonstrate likelihood's utility on previously published data, using Metropolis Monte Carlo sampling to infer DNA-binding energy transcription factor proteins. The properly analyze leads us propose new experimental assay, called Sort-Seq. This technique uses ultra-high-throughput protein-DNA protein-protein interactions transcriptional regulation at specific genomic loci. present proof-of-principle Sort- Seq experiment probing lacZ promoter E. coli, we use characterize the sequence-dependent binding CRP. then discuss what one can, in principle, infer Sort-Seq sets. show that, with enough multiple proteins per sequence, one able both interaction energies in absolute thermal units. conclude that, ultra-highthroughput sequencing, might provide sensitive means by probe vivo biophysics.