De novo structure prediction of globular proteins aided by sequence variation-derived contacts.

作者: Tomasz Kosciolek , David T. Jones

DOI: 10.1371/JOURNAL.PONE.0092197

关键词: AlgorithmBioinformaticsSequence alignmentMultiple sequence alignmentProtein structure predictionCovarianceEstimation of covariance matricesGlobular proteinProtein domainProtein structureBiology

摘要: The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality de novo structure predictions. Here, we investigate potential benefits combining well-established fragment-based folding algorithm – FRAGFOLD, with PSICOV, method which uses sparse inverse covariance estimation to identify co-varying sites multiple sequence alignments. Using comprehensive set 150 diverse globular target proteins, up 266 amino acids length, are able address effectiveness and some limitations such approaches proteins practice. Overall find that using fragment assembly both statistical potentials predicted contacts is significantly better than either or alone. Results show nearly 80% correct predictions (TM-score ≥0.5) within analysed dataset mean TM-score 0.54. Unsuccessful modelling cases emerged from conformational sampling problems, insufficient accuracy. Nevertheless, strong dependency final models on fraction satisfied long-range was observed. This not only highlights importance these determining protein fold, but also (combined other ensemble-derived qualities) provides powerful guide as choice global selected model. A proposed assessment scoring function achieves 0.93 precision 0.77 recall for discrimination folds our decoys. These findings suggest approach well-suited blind variety unknown 3D structure, provided enough homologous sequences available construct large accurate alignment initial step.

参考文章(44)
Alan S. Lapedes, Bertrand Giraud, LonChang Liu, Gary D. Stormo, Correlated mutations in models of protein sequences: phylogenetic and structural effects Institute of Mathematical Statistics. pp. 236- 256 ,(1999) , 10.1214/LNMS/1215455556
Alexandre d'Aspremont, Onureena Banerjee, Laurent El Ghaoui, Model Selection Through Sparse Maximum Likelihood Estimation arXiv: Artificial Intelligence. ,(2007)
J. I. Sulkowska, F. Morcos, M. Weigt, T. Hwa, J. N. Onuchic, Genomics-aided structure prediction Proceedings of the National Academy of Sciences of the United States of America. ,vol. 109, pp. 10340- 10345 ,(2012) , 10.1073/PNAS.1207864109
M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, T. Hwa, Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 106, pp. 67- 72 ,(2009) , 10.1073/PNAS.0805923106
Debora S Marks, Thomas A Hopf, Chris Sander, Protein structure prediction from sequence variation Nature Biotechnology. ,vol. 30, pp. 1072- 1080 ,(2012) , 10.1038/NBT.2419
Kim T. Simons, Rich Bonneau, Ingo Ruczinski, David Baker, Ab initio protein structure prediction of CASP III targets using ROSETTA Proteins: Structure, Function, and Genetics. ,vol. 37, pp. 171- 176 ,(1999) , 10.1002/(SICI)1097-0134(1999)37:3+<171::AID-PROT21>3.0.CO;2-Z
M.Michael Gromiha, S. Selvaraj, Inter-residue interactions in protein folding and stability. Progress in Biophysics & Molecular Biology. ,vol. 86, pp. 235- 277 ,(2004) , 10.1016/J.PBIOMOLBIO.2003.09.003
Hetunandan Kamisetty, Sergey Ovchinnikov, David Baker, Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 110, pp. 15674- 15679 ,(2013) , 10.1073/PNAS.1314045110
E. Neher, How frequent are correlated changes in families of protein sequences Proceedings of the National Academy of Sciences of the United States of America. ,vol. 91, pp. 98- 102 ,(1994) , 10.1073/PNAS.91.1.98