作者: Isidore Rigoutsos , Aris Floratos , Christos Ouzounis , Yuan Gao , Laxmi Parida
DOI: 10.1002/(SICI)1097-0134(19991101)37:2<264::AID-PROT11>3.0.CO;2-C
关键词:
摘要: Using TEIRESIAS, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration the solution space, we have explored GenPept sequence database and built dictionary patterns with two more instances. The entries this dictionary, henceforth named seqlets, cover 98.12% amino acid positions input essence provide comprehensive finite descriptors for space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, thought as building blocks molecules are necessary (but not sufficient) condition function family equivalence memberships. Thus, either define conserved signatures cut across molecular families previously undetected signals deriving from functional convergence. Moreover, show also capture structurally motifs. availability has been derived such an unsupervised, hierarchical manner is generating new opportunities addressing problems range reliable classification correlation fragments categories faster sensitive engines homology searches, evolutionary studies, structure prediction. Proteins 1999;37:264–277. ©1999 Wiley-Liss, Inc.