Protein Structure Prediction (PSP)

Proteins are large organic compounds made of amino acidic residues arranged in a linear chain (primary structure). Most proteins fold into unique three-dimensional (3D) structures called interchangeably tertiary, folded, or native structures. Protein Structure Prediction (PSP) aims to reconstruct the 3D structure of a given protein starting from its primary structure (chain of residues). It is a well known fact that the 3D structure of a protein only depends on its primary structure. Discovering the tertiary structure of a protein can provide important clues about how the protein performs its function. PSP is one of the most important and still unsolved problem in computational biology.

PSP using contact maps

A contact map of a given protein

*P*is a binary matrix

*M*such that

*M*

*i,j*= 1 iff the physical distance between residues

*i*and

*j*in the native structure is less than or equal to a pre-assigned threshold

*t*. The contact map of each protein is a distinctive signature of its folded structure. Predicting the tertiary structure of a protein directly from its primary structure is a very complex and still unsolved problem. An alternative and probably more feasible approach is to predict the contact map of a protein from its primary structure and then to compute the tertiary structure starting from the predicted contact map. This last problem has been recently proven to be NP-Hard. We developed a heuristic algorithm called COMAR (Contact Map Reconstruction) that is able to reconstruct in a few seconds a 3D model that exactly matches the target contact map. Our method computes an exact model for the protein independently of the contact map threshold with 100% efficiency when tested on 1760 proteins from different structural classes. Repeated applications of our method (starting from randomly chosen distinct initial solutions) show that the same contact map may admit (depending on the threshold) quite different 3D models. Extensive experimental results show that contact map thresholds ranging from 10 to 18 Ångstrom allow reconstructing 3D models that are very similar to the proteins native structure.

In order to simulate possible scenarios of reconstruction from predicted (and therefore highly noised) contact maps we test the performances of COMAR on native contact maps when a perturbation with random errors is introduced. From our analysis we obtain that our algorithm performs better reconstructions on blurred contact maps when contacts are under predicted than over predicted. Moreover we give a new version of the algorithm (FT-COMAR [Fault Tolerant-COMAR]) which can be used with incomplete contact maps. FT-COMAR can ignore up to 75% of the contact map and still recover from the remaining 25% entries a three dimensional structure whose root mean square deviation (RMSD) from the native one is less then 4 Å. Our results indicate that the quality more than the quantity of predicted contacts is relevant to the protein 3D reconstruction and that some hints about “unsafe” areas in the predicted contact maps can be useful to improve reconstruction quality. For this, we implement a very simple filtering procedure to detect unsafe areas in contact maps and we show that by this and in the presences of errors the performance of the algorithm can be significantly improved. Furthermore, we show that both COMAR and FT-COMAR overcome previous state-of-the-art algorithms for the same task.

Protein Structure Selection

Protein Structure Selection (PSS), instead of reconstructing a 3D model for the given chain, aims to select among a given, possibly large, number of 3D structures (called decoys) those that are closer (according to a given notion of distance) to the original (unknown) one. Each decoy is represented by a set of points in three dimensions. Existing methods for solving PSS make use of suitably defined energy functions which heavily rely on the primary structure of the protein and on protein chemistry. We use a completely different approach to PSS which does not take advantage at all of the knowledge of the primary structure of the protein but only relies on the graph theoretic properties of the decoys graphs (vertices represent residues and edges represent pairs of amino acids whose euclidean distance is less than or equal to a fixed threshold: contact maps). Even if our methods only rely on approximate geometric information, experimental results show that some of the graph properties we adopt score similarly to energy-based filtering functions in selecting the best decoys.

Our results show the principal role of geometric information in PSS, setting a new starting point, and filtering method, for existing energy function based techniques.