Mapping the Space of Genomic Signatures

作者: Lila Kari , Kathleen A. Hill , Abu S. Sayem , Rallis Karamichalis , Nathaniel Bryans

DOI: 10.1371/JOURNAL.PONE.0119815

关键词: Taxonomic rankDistance transformGenomeComputational biologyGenomic signatureSequence alignmentBiologyComparative genomicsGeneticsDNA sequencingSequence analysis

摘要: We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination hundreds or thousands complete mitochondrial genomes. An "image distance" is computed each pair graphical representations sequences, distances are visualized as Molecular Distance Map: Each point on map represents sequence, spatial proximity between two points reflects degree structural similarity corresponding sequences. The representation utilized, Chaos Game Representation (CGR), genome- species-specific can thus act genomic signature. Consequently, Maps could inform species identification, taxonomic classifications and, certain extent, evolutionary history. image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares occurrences oligomers length up k (herein = 9) in DSSIM more than 5 million pairs genomes, used Multi-Dimensional Scaling (MDS) obtain that visually display sequence relatedness various subsets, at different levels. This general-purpose does not require alignment be compare similar vastly computer-generated, same lengths. illustrate potential uses this approach by applying it several subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, order Primates. analysis an extensive dataset confirms oligomer composition full mtDNA source information. also correctly finds most closely related anatomically modern human (the Neanderthal, Denisovan, chimp), from belongs cucumber.

参考文章(53)
Ľubica Beňušková, Matej Makula, INTERACTIVE VISUALISATION OF OLIGOMER FREQUENCY IN DNA Computing and Informatics \/ Computers and Artificial Intelligence. ,vol. 28, pp. 695- 710 ,(2009)
M. Randić, M. Vračko, A. Nandy, S. C. Basak, On 3-D graphical representation of DNA primary sequences and their numerical characterization. Journal of Chemical Information and Computer Sciences. ,vol. 40, pp. 1235- 1244 ,(2000) , 10.1021/CI000034Q
Jie Hu, Shaojun Liu, Jun Xiao, Yi Zhou, Cuiping You, Weiguo He, Rurong Zhao, Can Song, Yun Liu, Characteristics of diploid and triploid hybrids derived from female Megalobrama amblycephala Yih×male Xenocypris davidi Bleeker Aquaculture. ,vol. 364-365, pp. 157- 164 ,(2012) , 10.1016/J.AQUACULTURE.2012.08.025
Helen J Chatterjee, Simon YW Ho, Ian Barnes, Colin Groves, Estimating the phylogeny and divergence times of primates using a supermatrix approach BMC Evolutionary Biology. ,vol. 9, pp. 259- 259 ,(2009) , 10.1186/1471-2148-9-259
Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, Boris Worm, How Many Species Are There on Earth and in the Ocean PLOS Biology. ,vol. 9, ,(2011) , 10.1371/JOURNAL.PBIO.1001127
H. Joel Jeffrey, Chaos game visualization of sequences Computers & Graphics. ,vol. 16, pp. 25- 33 ,(1992) , 10.1016/0097-8493(92)90067-6
Jia-Feng Yu, Xiao Sun, Ji-Hua Wang, TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications Journal of Theoretical Biology. ,vol. 261, pp. 459- 468 ,(2009) , 10.1016/J.JTBI.2009.08.005
Chunxin Yuan, Bo Liao, Tian-ming Wang, New 3D graphical representation of DNA sequences and their numerical characterization Chemical Physics Letters. ,vol. 379, pp. 412- 417 ,(2003) , 10.1016/J.CPLETT.2003.07.023
Yu-hua Yao, Tian-ming Wang, A class of new 2-D graphical representation of DNA sequences and their application Chemical Physics Letters. ,vol. 398, pp. 318- 323 ,(2004) , 10.1016/J.CPLETT.2004.09.087