Computational statistics in molecular phylogenetics

作者: W.A.J. Fletcher

DOI:

关键词: Molecular phylogeneticsComputational statisticsComputer scienceNonsynonymous substitutionData miningRobustness (computer science)Synonymous substitutionAlignment-free sequence analysisIndelFalse positive paradox

摘要: Simulation remains a very important approach to testing the robustness and accuracy of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions deletions (indels). In this thesis I implement new, portable and flexible application, named INDELible, which can be used generate nucleotide, amino acid and codon sequence data by indels (under several models indel length distribution) as well as substitutions rich repertoire substitution models). In particular, introduce study that makes use one INDELible’s many unique features simulate with under codon allow nonsynonymous/synonymous substitution rate ratio vary among sites branches. This is quantify, first time, precise effects alignment errors on false-positive power the widely branch-site test positive selection. Several assessed in context. Through experiment, show do not cause the excessive false positives if correct, but lead to unacceptably high positives. Previous selection studies inferior programs are revisited demonstrate applicability my results in real world situations. Further work uses simulated from INDELible examine tree-shape branch length programs, impact errors on different methods phylogeny reconstruction. analysis performed explore which avoid generating kind most detrimental process of

参考文章(273)
Reed A Cartwright, Logarithmic gap costs decrease alignment accuracy BMC Bioinformatics. ,vol. 7, pp. 527- 527 ,(2006) , 10.1186/1471-2105-7-527
Hongxia Pang, Jiaowei Tang, Su-Shing Chen, Shiheng Tao, Statistical distributions of optimal global alignment scores of random protein sequences. BMC Bioinformatics. ,vol. 6, pp. 257- 257 ,(2005) , 10.1186/1471-2105-6-257
Michael S Rosenberg, Multiple sequence alignment accuracy and evolutionary distance estimation. BMC Bioinformatics. ,vol. 6, pp. 278- 278 ,(2005) , 10.1186/1471-2105-6-278
Nick Goldman, Ziheng Yang, A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution. ,vol. 11, pp. 725- 736 ,(1994) , 10.1093/OXFORDJOURNALS.MOLBEV.A040153
Scott Schwartz, Zheng Zhang, Kelly A Frazer, Arian Smit, Cathy Riemer, John Bouck, Richard Gibbs, Ross Hardison, Webb Miller, PipMaker—A Web Server for Aligning Two Genomic DNA Sequences Genome Research. ,vol. 10, pp. 577- 586 ,(2000) , 10.1101/GR.10.4.577
Mathieu Blanchette, Eric D Green, Webb Miller, David Haussler, Reconstructing large regions of an ancestral mammalian genome in silico Genome Research. ,vol. 14, pp. 2412- 2423 ,(2004) , 10.1101/GR.2800104
Walter M. Fitch, An improved method of testing for evolutionary homology. Journal of Molecular Biology. ,vol. 16, pp. 9- 16 ,(1966) , 10.1016/S0022-2836(66)80258-9
J. Hein, C. Wiuf, B. Knudsen, M.B. Møller, G. Wibling, Statistical alignment: computational properties, homology testing and goodness-of-fit. Journal of Molecular Biology. ,vol. 302, pp. 265- 279 ,(2000) , 10.1006/JMBI.2000.4061