Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins.

作者: Isidore Rigoutsos , Aris Floratos , Christos Ouzounis , Yuan Gao , Laxmi Parida

DOI: 10.1002/(SICI)1097-0134(19991101)37:2<264::AID-PROT11>3.0.CO;2-C

关键词:

摘要: Using TEIRESIAS, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration the solution space, we have explored GenPept sequence database and built dictionary patterns with two more instances. The entries this dictionary, henceforth named seqlets, cover 98.12% amino acid positions input essence provide comprehensive finite descriptors for space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, thought as building blocks molecules are necessary (but not sufficient) condition function family equivalence memberships. Thus, either define conserved signatures cut across molecular families previously undetected signals deriving from functional convergence. Moreover, show also capture structurally motifs. availability has been derived such an unsupervised, hierarchical manner is generating new opportunities addressing problems range reliable classification correlation fragments categories faster sensitive engines homology searches, evolutionary studies, structure prediction. Proteins 1999;37:264–277. ©1999 Wiley-Liss, Inc.

参考文章(49)
E.E. Abola, T.F. Koetzle, F.C. Bernstein, Protein Data Bank ,(1984)
John C. Wootton, Scott Federhen, Analysis of compositionally biased regions in sequence databases. Methods in Enzymology. ,vol. 266, pp. 554- 571 ,(1996) , 10.1016/S0076-6879(96)66035-2
Jorja G. Henikoff, Steven Henikoff, BLOCKS DATABASE AND ITS APPLICATIONS Methods in Enzymology. ,vol. 266, pp. 88- 105 ,(1996) , 10.1016/S0076-6879(96)66008-X
Lawrence Hunter, Nomi Harris, David J. States, Efficient classification of massive, unsegmented datastreams international conference on machine learning. pp. 224- 232 ,(1992) , 10.1016/B978-1-55860-247-2.50034-6
C. J. Bult, O. White, G. J. Olsen, L. Zhou, R. D. Fleischmann, G. G. Sutton, J. A. Blake, L. M. FitzGerald, R. A. Clayton, J. D. Gocayne, A. R. Kerlavage, B. A. Dougherty, J.-F. Tomb, M. D. Adams, C. I. Reich, R. Overbeek, E. F. Kirkness, K. G. Weinstock, J. M. Merrick, A. Glodek, J. L. Scott, N. S. M. Geoghagen, J. F. Weidman, J. L. Fuhrmann, D. Nguyen, T. R. Utterback, J. M. Kelley, J. D. Peterson, P. W. Sadow, M. C. Hanna, M. D. Cotton, K. M. Roberts, M. A. Hurst, B. P. Kaine, M. Borodovsky, H.-P. Klenk, C. M. Fraser, H. O. Smith, C. R. Woese, J. C. Venter, COMPLETE GENOME SEQUENCE OF THE METHANOGENIC ARCHAEON, $i(METHANOCOCCUS JANNASCHII) Science. ,vol. 273, pp. 1058- 1073 ,(1997) , 10.1126/SCIENCE.273.5278.1058
Peer Bork, Toby J. Gibson, Applying motif and profile searches. Methods in Enzymology. ,vol. 266, pp. 162- 184 ,(1996) , 10.1016/S0076-6879(96)66013-3
Frances C. Bernstein, Thomas F. Koetzle, Graheme J.B. Williams, Edgar F. Meyer, Michael D. Brice, John R. Rodgers, Olga Kennard, Takehiko Shimanouchi, Mitsuo Tasumi, The Protein Data Bank: a computer-based archival file for macromolecular structures. Journal of Molecular Biology. ,vol. 112, pp. 535- 542 ,(1977) , 10.1016/S0022-2836(77)80200-3
H. O. Smith, T. M. Annau, S. Chandrasegaran, Finding sequence motifs in groups of functionally related proteins Proceedings of the National Academy of Sciences of the United States of America. ,vol. 87, pp. 826- 830 ,(1990) , 10.1073/PNAS.87.2.826
Robert D Fleischmann, Mark D Adams, Owen White, Rebecca A Clayton, Ewen F Kirkness, Anthony R Kerlavage, Carol J Bult, Jean-Francois Tomb, Brian A Dougherty, Joseph M Merrick, Keith McKenney, Granger Sutton, Will FitzHugh, Chris Fields, Jeannine D Gocayne, John Scott, Robert Shirley, Li-lng Liu, Anna Glodek, Jenny M Kelley, Janice F Weidman, Cheryl A Phillips, Tracy Spriggs, Eva Hedblom, Matthew D Cotton, Teresa R Utterback, Michael C Hanna, David T Nguyen, Deborah M Saudek, Rhonda C Brandon, Leah D Fine, Janice L Fritchman, Joyce L Fuhrmann, NSM Geoghagen, Cheryl L Gnehm, Lisa A McDonald, Keith V Small, Claire M Fraser, Hamilton O Smith, J Craig Venter, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. ,vol. 269, pp. 496- 512 ,(1995) , 10.1126/SCIENCE.7542800