作者: Alexander J. Hartemink , Raluca M. Gordan
DOI:
关键词:
摘要: The initiation of two major processes in the eukaryotic cell, gene transcription and DNA replication, is regulated largely through interactions between proteins or protein complexes DNA. Although a lot known about interacting their role regulating specific binding motifs many regulatory are still to be determined. For this purpose, computational tools for motif discovery have been developed last decades. These employ variety strategies, from exhaustive search sampling techniques, with hope finding over-represented sets co-regulated co-bound sequences. Despite aimed at solving problem discovery, ability correctly detect limited. usually short times degenerate, which makes them difficult distinguish genomic background. We believe most efficient strategy improving performance not use increasingly complex statistical methods models, but incorporate more biology into principled manner. To end, we propose novel algorithm: PRIORITY. Based on general Gibbs framework, PRIORITY has advantage over other tools: it can different types biological information (e.g., nucleosome positioning information) guide sites toward regions where these likely occur nucleosome-free regions). We factor (TF) data yeast chromatin immunoprecipitation (ChIP-chip) experiments test our algorithm when incorporating three information: positioning, double-helical stability, evolutionary conservation information. In each case, additional proven very useful increasing accuracy finding, number identified up 52%. restricted TF data. work, also analyze origin recognition (ORC) show that utilize structural predict specificity ORC. Despite improvement obtained using information, success algorithms identifying limited, especially applied sequences bound vivo (such as those ChIP-chip) because observed protein-DNA necessarily direct. Some TFs associate only indirectly via partners, while others exhibit both direct indirect binding. method TF-DNA interactions, integrating data, occupancy vitro microarrays. When ChIP-chip reveals 48% readily explained by profiled TF, 16% remaining 36%, found none used analysis was able explain ChIPchip either too noisy set incomplete. As become available, build complete catalog interactions.