作者: Davide Heller , Damian Szklarczyk , Christian von Mering
DOI: 10.1186/S12859-019-2828-Z
关键词:
摘要: An orthologous group (OG) comprises a set of and paralogous genes that share last common ancestor (LCA). OGs are defined with respect to chosen taxonomic level, which delimits the position LCA in time specified speciation event. A hierarchy expands on this notion, connecting more general OGs, distant time, recent, fine-grained thereby spanning multiple levels tree life. Large scale inference OG hierarchies independently computed can suffer from inconsistencies between successive levels, such as duplication This be due confounding genetic signal or algorithmic limitations. Importantly, limit potential use for functional annotation third-party applications. Here we present new methodology ensure hierarchical consistency across levels. To resolve an inconsistency, subsample protein space members perform gene tree-species reconciliation each sampling. Differently previous approaches, by subsampling space, avoid notoriously difficult task accurately building reconciling very large phylogenies. We implement method into high-throughput pipeline apply it eggNOG database. independent domain definitions validate its performance. The presented shows that, contrary limitations, useful instrument construction hierarchies. key lies combination sampling smaller trees aggregating their reconciliations robustness. Results show comparable greater performance pipelines. code is available Github at: https://github.com/meringlab/og_consistency_pipeline .