作者: Deanna M Church , Leo Goodstadt , LaDeana W Hillier , Michael C Zody , Steve Goldstein
DOI: 10.1371/JOURNAL.PBIO.1000112
关键词: Sequence assembly 、 Biology 、 Comparative genomics 、 Gene 、 Whole genome sequencing 、 Population 、 Gene family 、 Human genome 、 Genetics 、 Genome 、 Computational biology
摘要: The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive of biology only possible with availability finished, high-quality genome assembly. finished clone-based assembly strain C57BL/6J reported here has over 175,000 fewer gaps 139 Mb more novel sequence, compared earlier MGSCv3 draft In analysis this revised are now able to define 20,210 protein-coding genes, thousand than predicted in (19,042 genes). addition, identified 439 long, non–protein-coding RNAs evidence transcribed orthologs human. We analyzed complex repetitive landscape 267 sequence was missing or misassembled previously published assembly, provide insights into reasons its resistance sequencing by whole-genome shotgun approaches. Duplicated regions within newly assembled tend be recent ancestry duplicates draft, correcting our initial evolution on lineage. These appear largely composed containing transposable elements duplicated genes; these, some may fixed population, but at least 40% segmentally sequences copy number variable even among laboratory strains. Mouse lineage-specific contain 3,767 genes drawn mainly from rapidly-changing gene families associated reproductive functions. therefore, greatly improves rodent-specific allows delineation ancestral biological functions shared derived not.