The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective.

作者: Yuxiang Jiang , Wyatt T. Clark , Iddo Friedberg , Predrag Radivojac

DOI: 10.1093/BIOINFORMATICS/BTU472

关键词:

摘要: Motivation: The automated functional annotation of biological macromolecules is a problem computational assignment concepts or ontological terms to genes and gene products. A number methods have been developed computationally annotate using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development accurate that can integrate disparate molecular data well an unbiased evaluation these methods. One important concern experimental annotations proteins are incomplete. This raises whether what degree currently available be reliably used train models estimate their performance accuracy. Results: We study effect incomplete on reliability in protein function prediction. Using structured-output learning framework, we provide theoretical analyses carry out simulations characterize growing correctness stability estimates corresponding different types then analyze real by simulating prediction, subsequent re-evaluation (after additional become available) GO term predictions. Our results agree with previous observations accumulating potential significantly impact accuracy assessments. find influence reflects complex interplay between prediction algorithm, metric underlying ontology. under realistic assumptions, our also suggest current large-scale evaluations meaningful almost surprisingly reliable. Contact: predrag@indiana.edu Supplementary information: at Bioinformatics online.

参考文章(14)
Andrew K. Rider, Reid A. Johnson, Darcy A. Davis, T. Ryan Hoens, Nitesh V. Chawla, Classifier Evaluation with Missing Negative Class Labels Advances in Intelligent Data Analysis XII. pp. 380- 391 ,(2013) , 10.1007/978-3-642-41398-8_33
Robert Rentzsch, Christine A. Orengo, Protein function prediction – the power of multiplicity Trends in Biotechnology. ,vol. 27, pp. 210- 219 ,(2009) , 10.1016/J.TIBTECH.2009.01.002
Christophe Dessimoz, Nives Škunca, Paul D. Thomas, CAFA and the Open World of protein function predictions Trends in Genetics. ,vol. 29, pp. 609- 610 ,(2013) , 10.1016/J.TIG.2013.09.005
Wyatt T. Clark, Predrag Radivojac, Information-theoretic evaluation of predicted ontological annotations. Bioinformatics. ,vol. 29, pp. 53- 61 ,(2013) , 10.1093/BIOINFORMATICS/BTT228
S Asburner, CA Ball, JA Blake, D Botstein, H Butler, JM Cherry, AP Davis, K Dolinski, SS Dwight, JT Eppig, MA Harris, DP Hill, L Issel‐Tarver, A Kasarskis, S Lewis, JC Matese, JE Richardson, M Ringwald, GM Rubin, G Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. ,vol. 25, pp. 25- 29 ,(2000) , 10.1038/75556
Predrag Radivojac, Wyatt T Clark, Tal Ronnen Oron, Alexandra M Schnoes, Tobias Wittkop, Artem Sokolov, Kiley Graim, Christopher Funk, Karin Verspoor, Asa Ben-Hur, Gaurav Pandey, Jeffrey M Yunes, Ameet S Talwalkar, Susanna Repo, Michael L Souza, Damiano Piovesan, Rita Casadio, Zheng Wang, Jianlin Cheng, Hai Fang, Julian Gough, Patrik Koskinen, Petri Törönen, Jussi Nokso-Koivisto, Liisa Holm, Domenico Cozzetto, Daniel WA Buchan, Kevin Bryson, David T Jones, Bhakti Limaye, Harshal Inamdar, Avik Datta, Sunitha K Manjari, Rajendra Joshi, Meghana Chitale, Daisuke Kihara, Andreas M Lisewski, Serkan Erdin, Eric Venner, Olivier Lichtarge, Robert Rentzsch, Haixuan Yang, Alfonso E Romero, Prajwal Bhat, Alberto Paccanaro, Tobias Hamp, Rebecca Kaßner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian Auer, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos, Jari Björne, Tapio Salakoski, Andrew Wong, Hagit Shatkay, Fanny Gatzmann, Ingolf Sommer, Mark N Wass, Michael JE Sternberg, Nives Škunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis AI Kourmpetis, Aalt DJ Van Dijk, Cajo JF Ter Braak, Yuanpeng Zhou, Qingtian Gong, Xinran Dong, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Barbara Di Camillo, Stefano Toppo, Liang Lan, Nemanja Djuric, Yuhong Guo, Slobodan Vucetic, Amos Bairoch, Michal Linial, Patricia C Babbitt, Steven E Brenner, Christine Orengo, Burkhard Rost, Sean D Mooney, Iddo Friedberg, None, A large-scale evaluation of computational protein function prediction Nature Methods. ,vol. 10, pp. 221- 227 ,(2013) , 10.1038/NMETH.2340
I. Friedberg, Automated protein function prediction—the genomic challenge Briefings in Bioinformatics. ,vol. 7, pp. 225- 242 ,(2006) , 10.1093/BIB/BBL004
Curtis Huttenhower, Matthew A. Hibbs, Chad L. Myers, Amy A. Caudy, David C. Hess, Olga G. Troyanskaya, The impact of incomplete knowledge on evaluation Bioinformatics. ,vol. 25, pp. 2404- 2410 ,(2009) , 10.1093/BIOINFORMATICS/BTP397
Charles Elkan, Keith Noto, Learning classifiers from only positive and unlabeled data Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 213- 220 ,(2008) , 10.1145/1401890.1401920