Evaluation of Different Reference Based Annotation Strategies Using RNA-Seq – A Case Study in Drososphila pseudoobscura

作者: Nicola Palmieri , Viola Nolte , Anton Suvorov , Carolin Kosiol , Christian Schlötterer

DOI: 10.1371/JOURNAL.PONE.0046415

关键词: FlyBase : A Database of Drosophila Genes & GenomesBiologyGenome projectSequence analysisDrosophila pseudoobscuraExpressed sequence tagAnnotationGeneticsComputational biologyGenomeIntron

摘要: RNA-Seq is a powerful tool for the annotation of genomes, in particular identification isoforms and UTRs. Nevertheless, several software tools exist no standard strategy to obtain reliable yet established. We tested different combinations most commonly used reference-based alignment (TopHat, GSNAP) combination with two frequently assemblers (Cufflinks, Scripture) evaluated potential improve Drosophila pseudoobscura. While GSNAP maps higher proportion reads, TopHat resulted more accurate when Cufflinks. Scripture had lowest sensitivity. Interestingly, after subsampling same coverage TopHat, we find that both mappers have similar performance, implying advantage mainly an artifact lower coverage. Overall, observed low concordance among approaches at junction isoform levels. Using data from sexes adult strains D. pseudoobscura detected alternative splicing about 30% FlyBase multiple-exon genes. Moreover, extended boundaries 6523 genes (about 40%). annotated 669 new genes, 45% them evidence. Most are located on unassembled contigs, reflecting their incomplete annotation. Finally, identified 99 additional not represented current genome contigs pseudoobscura, probably due location genomic regions difficult assemble (e.g. heterochromatic regions).

参考文章(44)
Guy Slater, Ewan Birney, Automated generation of heuristics for biological sequence comparison BMC Bioinformatics. ,vol. 6, pp. 31- 31 ,(2005) , 10.1186/1471-2105-6-31
Moisès Burset, Roderic Guigó, Evaluation of Gene Structure Prediction Programs Genomics. ,vol. 34, pp. 353- 367 ,(1996) , 10.1006/GENO.1996.0298
B. Daines, H. Wang, L. Wang, Y. Li, Y. Han, D. Emmert, W. Gelbart, X. Wang, W. Li, R. Gibbs, R. Chen, The Drosophila melanogaster transcriptome by paired-end RNA sequencing Genome Research. ,vol. 21, pp. 315- 324 ,(2011) , 10.1101/GR.107854.110
U. Nagalakshmi, Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, M. Snyder, The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing Science. ,vol. 320, pp. 1344- 1349 ,(2008) , 10.1126/SCIENCE.1158441
Asaf A Salamov, Victor V Solovyev, Ab initio Gene Finding in Drosophila Genomic DNA Genome Research. ,vol. 10, pp. 516- 522 ,(2000) , 10.1101/GR.10.4.516
Wei Li, Jianxing Feng, Tao Jiang, IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. Journal of Computational Biology. ,vol. 18, pp. 1693- 1707 ,(2011) , 10.1089/CMB.2011.0171
Robert Kofler, Pablo Orozco-terWengel, Nicola De Maio, Ram Vinay Pandey, Viola Nolte, Andreas Futschik, Carolin Kosiol, Christian Schlötterer, PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals PLoS ONE. ,vol. 6, pp. e15925- ,(2011) , 10.1371/JOURNAL.PONE.0015925