STRUctural CLAssification

General description

STRUCLA (STRUcture CLAssification) is a WWW tool for generation of trees based on distances inferred from protein structures. Several different methods for estimation of evolutionary distances are used and the list of avialable methods is continously expanding. Some of them are descirbed in the section Computational details

Computational Details

STRUCLA reads in a multiple structure file in the PDB format and sends the output to the e-mail address provided by the user. The output includes the following files:a set of distance matrices in the PHYLIP format (one for each measure of structural divergence), and the set of unrooted trees in the NEXUS format (one for each method). The measures, available at the moment are: 'classical' RMSD, RMSD100, defined by O.Carugo and S. Pongor, a measure proposed by Johnson et al (1990) and the measure proposed by Grishin (1995). Our server can also submit and receive jobs from the PRIDE server (Carugo and Pongor).

The execution of a typical job (less than 20 structures) typically takes less than one minute (in a case of our example it was 34 seconds) and increases quadratically with the number of structures.

Cut-off

After protein structures are superimposed, the pairs of equivalent atoms are defined according to the user-defined cutoff. That is, RMS is calculated only for those pairs of atoms in two compared structures, whose Euclidean distance is equal or smaller than provided cut-off. The default value of cut-off is 3.5A.

Inputing data into programm

The STRUCLA method operates on the Protein Data Bank (PDB)-formatted file containing multiple (at least three) protein structures. We recommend using SwissPDBViewer for generation of the input files, but in principle even manual concatenation of ASCII files obtained from the PDB should do. The names of structures are taken from the SPDBVn or HEADER records. Each structure should begin with either "COMPND ?" or "HEADER name of the structure" line. If the first line of each structure in the PDB file begins with "COMPND", structures should end with "SPDBVn name of the structure" lines. At the second case they should end with "TER" lines. The whole file should end with "END". Only the coordinates represented in the ATOM records are taken into account (HETATM records, for instance selenomethionyl residues are ignored!). Other lines, such as REMARK will be ignored too.

Alternatively, a set of structures specified by their IDs can be uploaded from the local version of the PDB database.

The complete example file and the results of its analysis are also provided