No fold recognition method is always best. Results from studies of different fold recognition methods

Arne Elofsson
Stockholm Bioinformatics Center
Department of Biochemistry
Stockholm University
10691 Stockholm

Here we report results from two recent studies of different fold recognition methods. In the first study we have performed the first large (10000 pairs) test of alignment quality using several different alignment methods (local, global, profile alignment, hmmer, sam.t98, clustalW, sspsi) (Elofsson, 2000 submitted). We show that both evolutionary information and predicted secondary structure information improves the alignment quality. The best alignments are obtained from a method that combines a sequence profile obtained from psiblast with predicted secondary structures.

In the second study. we present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers (Bujnicki et al 2000 submitted). Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM and INBGU. The assessment was carried out using as prediction targets a large number of selected protein structures released during October 1999 to April 2000. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: "easy" and "hard". We found that all servers were able to find the correct answer for the vast majority of the easy targets when a structurally similar fold was present in the server's fold libraries. However, among the hard targets - where standard methods such as PSI-BLAST fail - we found that the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which having a significantly accurate sequence-structure alignment. Unfortunately, the increased sensitivity of the fold-recognition servers over standard methods came with the cost of low specificity.

Probably the most interesting observation from these studies is that there is not a single method that always produce the best results (fold recognition or alignment). For instance we show that almost twice as many good models can be created using any method compared with the best method for fold related pairs and that each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one method in difficult prediction tasks. And it also implies that it would be possible to improve fold recognition performance significantly if a combination of several methods could be done without loosing specificity.

In conclusion, we would like to encourage all protein structure predictors to take advantage of the variety of methods available.

In both these studies we have used novel methods to measure the quality of a model generated from a fold recognition method. We will also discuss the advantages using these novel methods for measuring fold recognition capacity (Siew et al 2000, in press; Cristobal et al 2000, manuscript in preparation).

Back to the schedule