作者: Zhao Lei Zhang , Paul M Harrison , Mark Gerstein
DOI: 10.1016/S0022-2836(02)01035-5
关键词:
摘要: We have examined conserved protein motifs in the non-coding, intergenic regions (“pseudomotif patterns”) and surveyed their occurrence fly, worm, yeast human genomes (chromosomes 21 22 only). To identify these patterns, we masked out annotated genes, pseudogenes repeat from raw genomic sequence then compared remaining sequence, six-frame translation, against 1319 patterns PROSITE database. For each pseudomotif pattern, absolute number of occurrences is not very informative unless a statistical expectation; consequently, calculated expected pattern using Poisson model verified this with simulations. Using p-value cut-off 0.01, found 67 over-represented fly regions, 34 six yeast. These include zinc finger, leucine zipper, nucleotide-binding motif EGF domain. Many were common to two or more organisms, but there few that unique specific ones. Furthermore, overrepresented than although has fewer pseudogenes. This puzzling observation can be explained by higher deletion rate genome. also under-represented finding 23 12 18 If sequences truly random, would expect an equal over patterns. The fact for organism greater ones implies fraction consist ancient fragments that, due accumulated disablements, become unrecognizable conventional techniques gene pseudogene identification. Moreover, find aggregate occupy substantial regions. Further information available at