作者: Yuxiang Jiang , Wyatt T. Clark , Iddo Friedberg , Predrag Radivojac
DOI: 10.1093/BIOINFORMATICS/BTU472
关键词:
摘要: Motivation: The automated functional annotation of biological macromolecules is a problem computational assignment concepts or ontological terms to genes and gene products. A number methods have been developed computationally annotate using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development accurate that can integrate disparate molecular data well an unbiased evaluation these methods. One important concern experimental annotations proteins are incomplete. This raises whether what degree currently available be reliably used train models estimate their performance accuracy. Results: We study effect incomplete on reliability in protein function prediction. Using structured-output learning framework, we provide theoretical analyses carry out simulations characterize growing correctness stability estimates corresponding different types then analyze real by simulating prediction, subsequent re-evaluation (after additional become available) GO term predictions. Our results agree with previous observations accumulating potential significantly impact accuracy assessments. find influence reflects complex interplay between prediction algorithm, metric underlying ontology. under realistic assumptions, our also suggest current large-scale evaluations meaningful almost surprisingly reliable. Contact: predrag@indiana.edu Supplementary information: at Bioinformatics online.