Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

作者: Qian Zhang , Se-Ran Jun , Michael Leuze , David Ussery , Intawat Nookaew

DOI: 10.1038/SREP40712

关键词:

摘要: The development of rapid, economical genome sequencing has shed new light on the classification viruses. As October 2016, National Center for Biotechnology Information (NCBI) database contained >2 million viral sequences and a reference set ~4000 that cover wide range known families. Whole-genome can be used to improve provide insight into "tree life". However, due lack evolutionary conservation amongst diverse viruses, it is not feasible build tree life using traditional phylogenetic methods based conserved proteins. In this study, we an alignment-free method uses k-mers as genomic features large-scale comparison complete genomes available in RefSeq. To determine optimal feature length, k (an essential step constructing meaningful dendrogram), designed comprehensive strategy combines three approaches: (1) cumulative relative entropy, (2) average number common among genomes, (3) Shannon diversity index. This was all 3,905 resulting dendrogram shows consistency with taxonomy ICTV Baltimore

参考文章(47)
Rosemary Braun, Systems Analysis of High-Throughput Data Advances in Experimental Medicine and Biology. ,vol. 844, pp. 153- 187 ,(2014) , 10.1007/978-1-4939-2095-2_8
Liam J. Revell, phytools: an R package for phylogenetic comparative biology (and other things) Methods in Ecology and Evolution. ,vol. 3, pp. 217- 223 ,(2012) , 10.1111/J.2041-210X.2011.00169.X
Neha J. Varghese, Supratim Mukherjee, Natalia Ivanova, Konstantinos T. Konstantinidis, Kostas Mavrommatis, Nikos C. Kyrpides, Amrita Pati, Microbial species delineation using whole genome sequences Nucleic Acids Research. ,vol. 43, pp. 6761- 6771 ,(2015) , 10.1093/NAR/GKV657
S. Kullback, R. A. Leibler, On Information and Sufficiency Annals of Mathematical Statistics. ,vol. 22, pp. 79- 86 ,(1951) , 10.1214/AOMS/1177729694
O. Bonham-Carter, J. Steele, D. Bastola, Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis Briefings in Bioinformatics. ,vol. 15, pp. 890- 905 ,(2014) , 10.1093/BIB/BBT052
Weifeng Shi, Michael J. Carr, Linda Dunford, Chaodong Zhu, William W. Hall, Desmond G. Higgins, Identification of novel inter-genotypic recombinants of human hepatitis B viruses by large-scale phylogenetic analysis. Virology. ,vol. 427, pp. 51- 59 ,(2012) , 10.1016/J.VIROL.2012.01.030
Peter Simmonds, Methods for virus classification and the challenge of incorporating metagenomic sequence data. Journal of General Virology. ,vol. 96, pp. 1193- 1206 ,(2015) , 10.1099/JGV.0.000016
Sheli R Radoshitzky, Yīmíng Bào, Michael J Buchmeier, Rémi N Charrel, Anna N Clawson, Christopher S Clegg, Joseph L DeRisi, Sébastien Emonet, Jean-Paul Gonzalez, Jens H Kuhn, Igor S Lukashevich, Clarence J Peters, Victor Romanowski, Maria S Salvato, Mark D Stenglein, Juan Carlos de la Torre, None, Past, present, and future of arenavirus taxonomy Archives of Virology. ,vol. 160, pp. 1851- 1874 ,(2015) , 10.1007/S00705-015-2418-Y
G. E. Sims, S.-H. Kim, Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs) Proceedings of the National Academy of Sciences of the United States of America. ,vol. 108, pp. 8329- 8334 ,(2011) , 10.1073/PNAS.1105168108