作者: M. Clamp , B. Fry , M. Kamal , X. Xie , J. Cuff
关键词:
摘要: Although the Human Genome Project was completed 4 years ago, catalog of human protein-coding genes remains a matter controversy. Current catalogs list total ≈24,500 putative genes. It is broadly suspected that large fraction these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence evolutionary conservation with mouse or dog. However, there currently scientific justification for excluding simply fail to conservation: alternative hypothesis most actually valid reflect gene innovation primate lineage loss other lineages. Here, we reject this carefully analyzing nonconserved ORFs—specifically, their properties primates. We vast majority random occurrences. The analysis yields, as by-product, major revision current catalogs, cutting number ≈20,500. Specifically, it suggests should be added only if clear an encoded protein. also provides principled methodology evaluating future proposed additions catalog. Finally, results indicate has been relatively little true mammalian