Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence

作者: Boas Pucker , Daniela Holtgräwe , Bernd Weisshaar

DOI: 10.1186/S13104-017-2985-Y

关键词: GeneBiologyWhole genome sequencingSequence assemblyUntranslated regionGenome projectGenomespliceComputational biologyGene prediction

摘要: The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted set revealed some errors involving genes non-canonical splice sites in their introns. Since are difficult to predict initio, we checked for options improve annotation by transferring information from released Columbia-0 reference Araport11. Incorporation hints generated Araport11 enabled precise prediction sites. Manual inspection RNA-Seq read mapping and RT-PCR were applied validate structural annotations Predictions untranslated regions also updated harnessing potential Araport11’s information, which using high coverage data. improved Nd-1 assembly (GeneSet_Nd-1_v1.1) evaluated via comparison initial (GeneSet_Nd-1_v1.0) as well against Col-0 sequence. GeneSet_Nd-1_v1.1 contains previously missed 1256 genes. Reciprocal best hits 24,527 (89.4%) all nuclear indicate a quality.

参考文章(66)
Corrado Viotti, Laura Luoni, Piero Morandini, Maria Ida De Michelis, Characterization of the interaction between the plasma membrane H+-ATPase of Arabidopsis thaliana and a novel interactor (PPI1) FEBS Journal. ,vol. 272, pp. 5864- 5871 ,(2005) , 10.1111/J.1742-4658.2005.04985.X
Ralf Stracke, Gunnar Huep, Bernd Weisshaar, Use of mutants from T-DNA insertion populations generated by high-throughput screening The handbook of plant mutation screening: mining of natural and induced alleles. pp. 31- 54 ,(2010) , 10.1002/9783527629398.CH3
Ralf Stracke, Hirofumi Ishihara, Gunnar Huep, Aiko Barsch, Frank Mehrtens, Karsten Niehaus, Bernd Weisshaar, Differential regulation of closely related R2R3-MYB transcription factors controls flavonol accumulation in different parts of the Arabidopsis thaliana seedling Plant Journal. ,vol. 50, pp. 660- 677 ,(2007) , 10.1111/J.1365-313X.2007.03078.X
R Breathnach, P Chambon, Organization and Expression of Eucaryotic Split Genes Coding for Proteins Annual Review of Biochemistry. ,vol. 50, pp. 349- 383 ,(1981) , 10.1146/ANNUREV.BI.50.070181.002025
Xiangli Niu, Di Luo, Shaopei Gao, Guangjun Ren, Lijuan Chang, Yuke Zhou, Xiaoli Luo, Yuxiang Li, Pei Hou, Wei Tang, Bao-Rong Lu, Yongsheng Liu, A conserved unusual posttranscriptional processing mediated by short, direct repeated (SDR) sequences in plants. Journal of Genetics and Genomics. ,vol. 37, pp. 85- 99 ,(2010) , 10.1016/S1673-8527(09)60028-X
Neelam Goel, Shailendra Singh, Trilok Chand Aseri, A comparative analysis of soft computing techniques for gene prediction. Analytical Biochemistry. ,vol. 438, pp. 14- 21 ,(2013) , 10.1016/J.AB.2013.03.015