作者: Lila Kari , Kathleen A. Hill , Abu S. Sayem , Rallis Karamichalis , Nathaniel Bryans
DOI: 10.1371/JOURNAL.PONE.0119815
关键词: Taxonomic rank 、 Distance transform 、 Genome 、 Computational biology 、 Genomic signature 、 Sequence alignment 、 Biology 、 Comparative genomics 、 Genetics 、 DNA sequencing 、 Sequence analysis
摘要: We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination hundreds or thousands complete mitochondrial genomes. An "image distance" is computed each pair graphical representations sequences, distances are visualized as Molecular Distance Map: Each point on map represents sequence, spatial proximity between two points reflects degree structural similarity corresponding sequences. The representation utilized, Chaos Game Representation (CGR), genome- species-specific can thus act genomic signature. Consequently, Maps could inform species identification, taxonomic classifications and, certain extent, evolutionary history. image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares occurrences oligomers length up k (herein = 9) in DSSIM more than 5 million pairs genomes, used Multi-Dimensional Scaling (MDS) obtain that visually display sequence relatedness various subsets, at different levels. This general-purpose does not require alignment be compare similar vastly computer-generated, same lengths. illustrate potential uses this approach by applying it several subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, order Primates. analysis an extensive dataset confirms oligomer composition full mtDNA source information. also correctly finds most closely related anatomically modern human (the Neanderthal, Denisovan, chimp), from belongs cucumber.