作者: Xin Huang , Xiao-Guang Chen , Peter A. Armbruster
DOI: 10.1186/S12864-016-2923-8
关键词:
摘要: The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying molecular basis phenotypes using RNA-Sequencing (RNA-Seq). However, optimal strategy assembling vast amounts short RNA-Seq reads remains unresolved, especially organisms without sequenced genome. This compared four transcriptome methods, including widely used de novo assembler (Trinity), two re-assembly strategies utilizing proteomic and resources from closely related species (reference-based TransPS) genome-guided (Cufflinks). These were comprehensive database Aedes albopictus, which genome sequence recently been completed. quality various assemblies was assessed by number contigs generated, contig length distribution, percent paired-end read mapping, gene model representation via BLASTX. Our results reveal that generates similar models relative with fragmented reference, but produces highest level redundancy requires most computational power. Using reference guide can generate biased sequences. Increasing tends increase within decrease both median identity between protein provides general guidance data will depend upon subsequent downstream analyses. our emphasize efficacy assembly, be as effective when fragmented. If sufficient are available, it beneficial combine assemblies. Caution should taken assembly. quantity pairs does not necessarily correlate