作者: K. D. Pruitt , T. Tatusova , G. R. Brown , D. R. Maglott
DOI: 10.1093/NAR/GKR1079
关键词:
摘要: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected curated from public archives represent significant reduction in redundancy compared to the volume data archived by International Nucleotide Database Collaboration. includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 10(6) proteins 2 RNA spanning prokaryotes, eukaryotes viruses (RefSeq release 49, September 2011). RefSeq maintained combined approach automated analyses, collaboration manual curation generate an up-to-date representation sequence, its features, names cross-links related sources information. We report here on recent growth, status curating human set, more extensive feature annotation current policy eukaryotic genome via NCBI pipeline. More information about resource available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).