BIOPHYSICAL MODELS OF TRANSCRIPTIONAL REGULATION FROM SEQUENCE DATA

作者: Justin Block Kinney

DOI:

关键词:

摘要: In the post-genomics era, DNA sequence itself is becoming a medium by which to probe biological phenomena. With advent of microarray technology, and ultrahigh- throughput sequencing more recently, large data sets are becoming standard products day-to-day research. Yet as software for analyzing such data proliferates, fundamental understanding how should be used to gain insight missing from literature. The focus this thesis on developing tools characterizing biophysical interactions underlying transcriptional regulation { ability cells to control which genes they transcribe mRNA, thus express protein. We begin by presenting basic principles analysis specically, each � accompanied (perhaps very noisy) measurement z biophysical functionality. A salient feature experiments produce such �z di�culty experimental noise priori. overcome this obstacle introducing error-model-averaged (EMA) likelihood, allows biophysical models arbitrary functional form rigorously t data. EMA likelihood closely related mutual information, but its probabilistic interpretation provides some advantages. demonstrate likelihood's utility on previously published data, using Metropolis Monte Carlo sampling to infer DNA-binding energy transcription factor proteins. The properly analyze leads us propose new experimental assay, called Sort-Seq. This technique uses ultra-high-throughput protein-DNA protein-protein interactions transcriptional regulation at specific genomic loci. present proof-of-principle Sort- Seq experiment probing lacZ promoter E. coli, we use characterize the sequence-dependent binding CRP. then discuss what one can, in principle, infer Sort-Seq sets. show that, with enough multiple proteins per sequence, one able both interaction energies in absolute thermal units. conclude that, ultra-highthroughput sequencing, might provide sensitive means by probe vivo biophysics.

参考文章(108)
Arnold R. Oliphant, Kevin Struhl, The use of random-sequence oligonucleotides for determining consensus sequences. Methods in Enzymology. ,vol. 155, pp. 568- 582 ,(1987) , 10.1016/0076-6879(87)55037-6
Gasper Tkacik, Gurinder Singh Atwal, William Bialek, Noam Slonim, Estimating mutual information and multi--information in large networks arXiv: Information Theory. ,(2005)
Pål Nyrén, The history of pyrosequencing. Methods of Molecular Biology. ,vol. 373, pp. 1- 14 ,(2007) , 10.1385/1-59745-377-3:1
S. Small, A. Blair, M. Levine, Regulation of even-skipped stripe 2 in the Drosophila embryo. The EMBO Journal. ,vol. 11, pp. 4047- 4057 ,(1992) , 10.1002/J.1460-2075.1992.TB05498.X
Ilya Nemenman, William Bialek, Occam factors and model independent Bayesian learning of continuous distributions Physical Review E. ,vol. 65, pp. 026137- 026137 ,(2002) , 10.1103/PHYSREVE.65.026137