STRUctural CLAssification

EXAMPLE ANALYSIS: a subset of PD-(D/E)xK deoxyribonucleases and related tRNA endonucleases

The graphical presentation of the superimposed structures: helices are colored in red, strands are colored in yellow.

The example file has been prepared in SwissPDBViewer: the individual structures were downloaded from the Protein Data Bank (with the exception of two domain of A.fulgidus EndA (Li and Abelson, J Mol Biol. 2000 Sep 22;302(3):639-48.), provided by Dr. Hong Li ) Only one chain (a monomer) was used in each case. The structures were superimposed and elements outside the common core (N- and C-terminal extensions) were deleted. It is important to remove non-homologous elements before the analysis, otherwise they are a source of noise that precludes inference of meaningful phylogenies. If the set of structures contains nonhomologous proteins or proteins with non-homologous segments (additional domains etc.), then the dendrograms generated by the individual methods, as well as the consensus tree produced by the STRUCLA server will have no phylogenetic sense (although they will still reflect the degree of "structural relatedness", whatever it means).

We selected the cutoff of 3.5A to increase the number of pairs of residues regarded as equivalent in the superimposed structures. The lower the cutoff, the less of the diverged regions are compared. We recommend testing different cutoff values for the same data set (especially for strongly diverged proteins) as one of the methods to estimate the robustness of the final tree.

The input file has been uploaded, the file has been analyzed locally or sent to remote servers, and the results (distance matrices and dendrograms) have been collected. The trees in the NEXUS format and distance matrices have been sent by email. Below we show the graphical presentations of the output files, visualized using TreeView (the STRUCLA server itself does not yet display the trees, although we are planning to implement this feature in the future)

Figure 1: The unrooted NJ tree generated using on the "traditional" RMS.

Figure 2: The unrooted NJ tree generated using on the "improved" RMS100 measure.

Figure 3: The unrooted NJ tree generated with the method proposed by Johnson et al (1990).

Figure 4: The unrooted NJ tree generated with the method proposed by Grishin (1997).

Figure 5: The unrooted NJ tree generated with the method proposed by Carugo and Pongor (2002), as implemented in their PRIDE server.

Figure 6. The consensus tree. The values at the nodes indicate the percent frequency of the occurrence of a given subfamily among trees produced by different methods. The branches with support below 50% should be regarded as unresolved.

The result of structure-based treeing has shown that the three domains of tRNA endonuclease EndA (1a79, 2afa and 2afb - the latter two not available from PDB) group together. The Archaeal tRNAases are most closely related to Archaeal Holliday junction resolvases (1gef and 1hh1). The other branch of the tree is occupied by restriction enzymes, with the exception of one branch, grouping a DNA repair enzyme MutH (2azo) and phage lambda exonuclease (1avq). Nonetheless, this branch is not reliably supported according to the consensus analysis and the relationships between 2azo, 1avq, and 1nae and the other branch grouping 1kc6/1dmu/1rvb should be regarded as unresolved. The relationship between 1rvb (EcoRV), 1dmu (BglI), and 1kc6 (HincII) has been earlier reported in the literature (Bujnicki, 2001)