作者: Boas Pucker , Daniela Holtgräwe , Bernd Weisshaar
DOI: 10.1186/S13104-017-2985-Y
关键词: Gene 、 Biology 、 Whole genome sequencing 、 Sequence assembly 、 Untranslated region 、 Genome project 、 Genome 、 splice 、 Computational biology 、 Gene prediction
摘要: The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted set revealed some errors involving genes non-canonical splice sites in their introns. Since are difficult to predict initio, we checked for options improve annotation by transferring information from released Columbia-0 reference Araport11. Incorporation hints generated Araport11 enabled precise prediction sites. Manual inspection RNA-Seq read mapping and RT-PCR were applied validate structural annotations Predictions untranslated regions also updated harnessing potential Araport11’s information, which using high coverage data. improved Nd-1 assembly (GeneSet_Nd-1_v1.1) evaluated via comparison initial (GeneSet_Nd-1_v1.0) as well against Col-0 sequence. GeneSet_Nd-1_v1.1 contains previously missed 1256 genes. Reciprocal best hits 24,527 (89.4%) all nuclear indicate a quality.