Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies

作者: Davide Heller , Damian Szklarczyk , Christian von Mering

DOI: 10.1186/S12859-019-2828-Z

关键词:

摘要: An orthologous group (OG) comprises a set of and paralogous genes that share last common ancestor (LCA). OGs are defined with respect to chosen taxonomic level, which delimits the position LCA in time specified speciation event. A hierarchy expands on this notion, connecting more general OGs, distant time, recent, fine-grained thereby spanning multiple levels tree life. Large scale inference OG hierarchies independently computed can suffer from inconsistencies between successive levels, such as duplication This be due confounding genetic signal or algorithmic limitations. Importantly, limit potential use for functional annotation third-party applications. Here we present new methodology ensure hierarchical consistency across levels. To resolve an inconsistency, subsample protein space members perform gene tree-species reconciliation each sampling. Differently previous approaches, by subsampling space, avoid notoriously difficult task accurately building reconciling very large phylogenies. We implement method into high-throughput pipeline apply it eggNOG database. independent domain definitions validate its performance. The presented shows that, contrary limitations, useful instrument construction hierarchies. key lies combination sampling smaller trees aggregating their reconciliations robustness. Results show comparable greater performance pipelines. code is available Github at: https://github.com/meringlab/og_consistency_pipeline .

参考文章(39)
Adrian M. Altenhoff, Christophe Dessimoz, Inferring orthology and paralogy. In: UNSPECIFIED (259 - 279). (2012). pp. 259- 279 ,(2012) , 10.1007/978-1-61779-582-4_9
Maureen Stolzer, Katherine Siewert, Han Lai, Minli Xu, Dannie Durand, Event inference in multidomain families with phylogenetic reconciliation BMC Bioinformatics. ,vol. 16, pp. 1- 20 ,(2015) , 10.1186/1471-2105-16-S14-S8
Norman D. Megill, Mladen Pavicic, Estimating Bernoulli trial probability from a small sample arXiv: Distributed, Parallel, and Cluster Computing. ,(2011)
Hedvig Tordai, Alinda Nagy, Krisztina Farkas, László Bányai, László Patthy, Modules, multidomain proteins and organismic complexity. FEBS Journal. ,vol. 272, pp. 5064- 5078 ,(2005) , 10.1111/J.1742-4658.2005.04917.X
Eugene V. Koonin, Yuri I. Wolf, Georgy P. Karev, The structure of the protein universe and genome evolution Nature. ,vol. 420, pp. 218- 223 ,(2002) , 10.1038/NATURE01256
E. V. Kriventseva, N. Rahman, O. Espinosa, E. M. Zdobnov, OrthoDB: the hierarchical catalog of eukaryotic orthologs Nucleic Acids Research. ,vol. 36, pp. 271- 275 ,(2007) , 10.1093/NAR/GKM845
Eugene Koonin, Roman Tatusov, Michael Galperin, Mikhail Rozanov, David Lipman, A Genomic Perspective on Protein Families Science. ,vol. 278, pp. 631- 637 ,(1997) , 10.1126/SCIENCE.278.5338.631
Adrian M. Altenhoff, Manuel Gil, Gaston H. Gonnet, Christophe Dessimoz, Inferring Hierarchical Orthologous Groups from Orthologous Gene Pairs PLoS ONE. ,vol. 8, pp. e53786- ,(2013) , 10.1371/JOURNAL.PONE.0053786
Sean Powell, Kristoffer Forslund, Damian Szklarczyk, Kalliopi Trachana, Alexander Roth, Jaime Huerta-Cepas, Toni Gabaldon, Thomas Rattei, Chris Creevey, Michael Kuhn, Lars J Jensen, Christian Von Mering, Peer Bork, None, eggNOG v4.0: nested orthology inference across 3686 organisms Nucleic Acids Research. ,vol. 42, pp. 231- 239 ,(2014) , 10.1093/NAR/GKT1253