GOstruct 2.0: Automated Protein Function Prediction for Annotated Proteins

作者: Indika Kahanda , Asa Ben-Hur

DOI: 10.1145/3107411.3107417

关键词: Task (project management)Computer scienceArtificial intelligenceProtein function predictionFunction (engineering)Data miningA proteinMachine learning

摘要: Automated Protein Function Prediction is the task of automatically predicting functional annotations for a protein based on gold-standard derived from experimental assays. These experiment-based accumulate over time: proteins without get annotated, and new functions already annotated are discovered. Therefore, function prediction can be considered combination two sub-tasks: making predictions previously unannotated proteins. In previous work, we analyzed performance several methods in these scenarios. Our results showed that GOstruct, which structured output framework, had lower accuracy with existing annotations, while its un-annotated was similar to cross-validation. this present GOstruct 2.0 includes improvements allow model make use information protein's current better handle novel This highly important organisms where most have some level annotations. Experimental human data show outperforms original task, demonstrating effectiveness proposed improvements. first study focuses adapting framework applications labels incomplete by nature.

参考文章(35)
Davide Chicco, Marco Tagliasacchi, Marco Masseroli, Genomic Annotation Prediction Based on Integrated Information computational intelligence methods for bioinformatics and biostatistics. pp. 238- 252 ,(2011) , 10.1007/978-3-642-35686-5_20
Indika Kahanda, Christopher Funk, Karin Verspoor, Asa Ben-Hur, PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources. F1000Research. ,vol. 4, pp. 259- 259 ,(2015) , 10.12688/F1000RESEARCH.6670.1
Gene Ontology Consortium, None, Gene Ontology Consortium: going forward Nucleic Acids Research. ,vol. 43, ,(2015) , 10.1093/NAR/GKU1179
BORIS HAYETE, JADWIGA R. BIENKOWSKA, Gotrees: predicting go associations from protein domain composition using decision trees. pacific symposium on biocomputing. pp. 127- 138 ,(2004) , 10.1142/9789812702456_0013
David Warde-Farley, Sylva L. Donaldson, Ovi Comes, Khalid Zuberi, Rashad Badrawi, Pauline Chao, Max Franz, Chris Grouios, Farzana Kazi, Christian Tannus Lopes, Anson Maitland, Sara Mostafavi, Jason Montojo, Quentin Shao, George Wright, Gary D. Bader, Quaid Morris, The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research. ,vol. 38, pp. 214- 220 ,(2010) , 10.1093/NAR/GKQ537
ARTEM SOKOLOV, ASA BEN-HUR, HIERARCHICAL CLASSIFICATION OF GENE ONTOLOGY TERMS USING THE GOstruct METHOD Journal of Bioinformatics and Computational Biology. ,vol. 8, pp. 357- 376 ,(2010) , 10.1142/S0219720010004744
Olivier Dameron, Charles Bettembourg, Nolwenn Le Meur, Measuring the Evolution of Ontology Complexity: The Gene Ontology Case Study PLoS ONE. ,vol. 8, pp. e75993- ,(2013) , 10.1371/JOURNAL.PONE.0075993
Cynthia L. Smith, Janan T. Eppig, The mammalian phenotype ontology: enabling robust annotation and comparative analysis Wiley Interdisciplinary Reviews: Systems Biology and Medicine. ,vol. 1, pp. 390- 399 ,(2009) , 10.1002/WSBM.44
John Moult, Jan T Pedersen, Richard Judson, Krzysztof Fidelis, None, A large-scale experiment to assess protein structure prediction methods Proteins: Structure, Function, and Genetics. ,vol. 23, pp. ii- iv ,(1995) , 10.1002/PROT.340230303