Digging Deep for Ancient Relics: A Survey of Protein Motifs in the Intergenic Sequences of Four Eukaryotic Genomes

作者: Zhao Lei Zhang , Paul M Harrison , Mark Gerstein

DOI: 10.1016/S0022-2836(02)01035-5

关键词:

摘要: We have examined conserved protein motifs in the non-coding, intergenic regions (“pseudomotif patterns”) and surveyed their occurrence fly, worm, yeast human genomes (chromosomes 21 22 only). To identify these patterns, we masked out annotated genes, pseudogenes repeat from raw genomic sequence then compared remaining sequence, six-frame translation, against 1319 patterns PROSITE database. For each pseudomotif pattern, absolute number of occurrences is not very informative unless a statistical expectation; consequently, calculated expected pattern using Poisson model verified this with simulations. Using p-value cut-off 0.01, found 67 over-represented fly regions, 34 six yeast. These include zinc finger, leucine zipper, nucleotide-binding motif EGF domain. Many were common to two or more organisms, but there few that unique specific ones. Furthermore, overrepresented than although has fewer pseudogenes. This puzzling observation can be explained by higher deletion rate genome. also under-represented finding 23 12 18 If sequences truly random, would expect an equal over patterns. The fact for organism greater ones implies fraction consist ancient fragments that, due accumulated disablements, become unrecognizable conventional techniques gene pseudogene identification. Moreover, find aggregate occupy substantial regions. Further information available at

参考文章(33)
Amos Marc Bairoch, Philip Bucher, A Generalized Profile Syntax for Biomolecular Sequence Motifs and its Function in Automatic Sequence Interpretation intelligent systems in molecular biology. ,vol. 2, pp. 53- 61 ,(1994)
Isidore Rigoutsos, Aris Floratos, Christos Ouzounis, Yuan Gao, Laxmi Parida, Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins. Proteins. ,vol. 37, pp. 264- 277 ,(1999) , 10.1002/(SICI)1097-0134(19991101)37:2<264::AID-PROT11>3.0.CO;2-C
Wyeth W. Wasserman, Michael Palumbo, William Thompson, James W. Fickett, Charles E. Lawrence, Human-mouse genome comparisons to locate regulatory sites. Nature Genetics. ,vol. 26, pp. 225- 228 ,(2000) , 10.1038/79965
Paul M Harrison, Anuj Kumar, Ning Lang, Michael Snyder, Mark Gerstein, A question of size: the eukaryotic proteome and the problems in defining it Nucleic Acids Research. ,vol. 30, pp. 1083- 1090 ,(2002) , 10.1093/NAR/30.5.1083
Niclas Jareborg, Ewan Birney, Richard Durbin, Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs Genome Research. ,vol. 9, pp. 815- 824 ,(1999) , 10.1101/GR.9.9.815
SVETLANA A. SHABALINA, ALEXEY S. KONDRASHOV, Pattern of selective constraint in C. elegans and C. briggsae genomes. Genetics Research. ,vol. 74, pp. 23- 30 ,(1999) , 10.1017/S0016672399003821
Paul M Harrison, Nathaniel Echols, Mark B Gerstein, Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome Nucleic Acids Research. ,vol. 29, pp. 818- 830 ,(2001) , 10.1093/NAR/29.3.818
A. Campbell, J. Mrazek, S. Karlin, Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA Proceedings of the National Academy of Sciences of the United States of America. ,vol. 96, pp. 9184- 9189 ,(1999) , 10.1073/PNAS.96.16.9184