Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal new biological insights

作者: RP Baptista , Y Li , A Sateriale , MJ Sanders , KL Brooks

DOI: 10.1101/2021.01.29.428682

关键词:

摘要: ABSTRACT Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants the immunosuppressed. Despite its importance, Cryptosporidium community still relies on fragmented reference genome sequence from 2004. Incomplete sequences hamper experimental design interpretation. We have generated new C. parvum IOWA assembly supported by PacBio Oxford Nanopore long-read technologies comparative consistent annotation for three closely related species , hominis tyzzeri . The larger, gap free lacks ambiguous bases. This chromosomal recovers 13 16 possible telomeres raises hypothesis remaining associated subtelomeric regions. Comparative revealed that most “missing” orthologs are found suggesting differences result primarily structural rearrangements, gene copy number variation SNVs parvum, made >1,500 parvu m updates based evidence. They included transporters, ncRNAs, introns altered structures. complete DNA methylase Dnmt2 ortholog. 190 genes under positive selection including many candidates were identified using as reference. Finally, amplification events detected reveal level plasticity will both inform impact future research.

参考文章(98)
Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features Bioinformatics. ,vol. 26, pp. 841- 842 ,(2010) , 10.1093/BIOINFORMATICS/BTQ033
H. Li, R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform Bioinformatics. ,vol. 25, pp. 1754- 1760 ,(2009) , 10.1093/BIOINFORMATICS/BTP324
Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, Steven L Salzberg, StringTie enables improved reconstruction of a transcriptome from RNA-seq reads Nature Biotechnology. ,vol. 33, pp. 290- 295 ,(2015) , 10.1038/NBT.3122
Ambrish Roy, Jianyi Yang, Yang Zhang, COFACTOR: an accurate comparative algorithm for structure-based protein function annotation Nucleic Acids Research. ,vol. 40, pp. 471- 477 ,(2012) , 10.1093/NAR/GKS372
Pablo Cingolani, Adrian Platts, Le Lily Wang, Melissa Coon, Tung Nguyen, Luan Wang, Susan J. Land, Xiangyi Lu, Douglas M. Ruden, A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly. ,vol. 6, pp. 80- 92 ,(2012) , 10.4161/FLY.19695
Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi, Glenn Tesler, QUAST: quality assessment tool for genome assemblies Bioinformatics. ,vol. 29, pp. 1072- 1075 ,(2013) , 10.1093/BIOINFORMATICS/BTT086
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, , The Sequence Alignment/Map format and SAMtools Bioinformatics. ,vol. 25, pp. 2078- 2079 ,(2009) , 10.1093/BIOINFORMATICS/BTP352
Alexandre Lomsadze, Vardges Ter-Hovhannisyan, Yury O Chernoff, Mark Borodovsky, Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Research. ,vol. 33, pp. 6494- 6506 ,(2005) , 10.1093/NAR/GKI937
Mathieu Gissot, Sang-Woon Choi, Reid F. Thompson, John M. Greally, Kami Kim, Toxoplasma gondii and Cryptosporidium parvum Lack Detectable DNA Cytosine Methylation Eukaryotic Cell. ,vol. 7, pp. 537- 540 ,(2008) , 10.1128/EC.00448-07