An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies

作者: Lawrence A. Kelley , Stephen P. Gardner , Michael J. Sutcliffe

DOI: 10.1093/PROTEIN/9.11.1063

关键词:

摘要: Unlike structures determined by X-ray crystallography, which are deposited in the Brookhaven Protein Data Bank (Abola et al., 1987) as a single structure, each NMR-derived structure is often an ensemble containing many structures, consistent with restraint set used. However, there need to select 'representative' or subset of from such ensemble. This useful, for example, case homology modelling when compiling relational database protein structures. It has been shown that cluster analysis, based on overall fold, followed selection closest centroid largest cluster, likely identify more representative than commonly used minimized average (Sutcliffe, 1993). Two approaches problem clustering ensembles have described. One these (Adzhubei 1995) performs pairwise superposition all using C atoms generate r.m.s. distances. After analysis distances, user-defined cut-off required determine final membership clusters and therefore The other approach (Diamond, uses collective superpositions rigid-body transformations. Again, position at draw particular pattern was not addressed. Whenever fixed values clustering, danger missing 'true' under threshold imposed rigid value. Considering highly diverse nature proteins, it would seem most appropriate avoid use predefined determining clusters. In fact, 302 we studied, distance across varied 0.29 11.3 A (mean value 3.0, SD 1.9 A). Here present automated method determination avoids dangers this purpose. We developed computer program automatically, systematically rapidly (i) into conformationally related subfamilies, (ii) selects cluster. linkage define how built up, application penalty function seeks minimize simultaneously number

参考文章(0)