作者: Julie M Allen , Bret Boyd , Nam-Phuong Nguyen , Pranjal Vachaspati , Tandy Warnow
关键词: Genomics 、 Biology 、 Computational biology 、 Phylogenomics 、 Genome 、 Phylogenetic tree 、 Sequence analysis 、 Whole genome sequencing 、 DNA sequencing 、 Contig
摘要: Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently most commonly used phylogenomic approaches involve some form genome reduction. While these make assembling more economical for organisms with large genomes, they reduce genomic coverage and thereby long-term utility data. Currently, moderate small genomes ($<$1000 Mbp) it is feasible sequence entire at modest ($10-30\times$). Computational challenges handling alleviated by targeted reads, rather than genome, produce a matrix. Here we demonstrate use automated Target Restricted Assembly Method (aTRAM) assemble 1107 single-copy ortholog genes from whole sucking lice (Anoplura) out-groups. We developed pipeline extract exon sequences aTRAM assemblies annotating them respect original target protein. aligned protein inferred amino acids then performed analyses on both concatenated matrix each gene separately in coalescent analysis. Finally, tested limits successful assembly 100 close- distantly related taxa high low levels coverage.Both analysis coalescent-based produced same tree topology, which was consistent previously published results resolved weakly supported nodes. These this approach developing raw reads. Further, found coverages above $5-10\times$, 80-90% contigs close taxa. As costs continue decline, expect full will become wider array organisms, enable mining an extensive variety applications, including phylogenomics. [aTRAM; assembly; sequencing; phylogenomics.].