Our
research group is interested in designing and developing
algorithms and tools to solve biological problems.
Currently our research achievements regard contact maps and
their graph properties, aiming to improve protein structure
prediction.
Protein
Structure Prediction (PSP)
Proteins are large organic compounds made of amino acidic residues
arranged in a linear chain (primary structure). Most
proteins fold into unique three-dimensional (3D) structures
called interchangeably tertiary, folded, or native
structures. Protein Structure Prediction (PSP) aims to
reconstruct the 3D structure of a given protein starting
from its primary structure (chain of residues). It is a
well known fact that the 3D structure of a protein only
depends on its primary structure. Discovering the tertiary
structure of a protein can provide important clues about
how the protein performs its function. PSP is one of the
most important and still unsolved problem in computational
biology.
PSP
using contact maps
A contact map of a given protein P
is
a binary matrix M
such
that Mi,j
= 1
iff the physical distance between residues
i
and
j
in
the native structure is less than or equal to a
pre-assigned threshold t.
The contact map of each protein is a distinctive signature
of its folded structure. Predicting the tertiary structure
of a protein directly from its primary structure is a very
complex and still unsolved problem. An alternative and
probably more feasible approach is to predict the contact
map of a protein from its primary structure and then to
compute the tertiary structure starting from the predicted
contact map. This last problem has been recently proven to
be NP-Hard. We developed a heuristic algorithm called COMAR
(Contact Map Reconstruction) that is able to reconstruct in
a few seconds a 3D model that exactly matches the target
contact map. Our method computes an exact model for the
protein independently of the contact map threshold with
100% efficiency when tested on 1760 proteins from different
structural classes. Repeated applications of our method
(starting from randomly chosen distinct initial solutions)
show that the same contact map may admit (depending on the
threshold) quite different 3D models. Extensive
experimental results show that contact map thresholds
ranging from 10 to 18 Ångstrom allow reconstructing 3D
models that are very similar to the proteins native
structure.
In order to simulate possible scenarios of reconstruction
from predicted (and therefore highly noised) contact maps
we test the performances of COMAR on native contact maps
when a perturbation with random errors is introduced. From
our analysis we obtain that our algorithm performs better
reconstructions on blurred contact maps when contacts are
under predicted than over predicted. Moreover we give a new
version of the algorithm (FT-COMAR [Fault Tolerant-COMAR])
which can be used with incomplete contact maps. FT-COMAR
can ignore up to 75% of the contact map and still recover
from the remaining 25% entries a three dimensional
structure whose root mean square deviation (RMSD) from the
native one is less then 4 Å. Our results indicate that the
quality more than the quantity of predicted contacts is
relevant to the protein 3D reconstruction and that some
hints about “unsafe” areas in the predicted contact maps
can be useful to improve reconstruction quality. For this,
we implement a very simple filtering procedure to detect
unsafe areas in contact maps and we show that by this and
in the presences of errors the performance of the algorithm
can be significantly improved. Furthermore, we show that
both COMAR and FT-COMAR overcome previous state-of-the-art
algorithms for the same task.
Protein
Structure Selection
Protein Structure Selection (PSS), instead of
reconstructing a 3D model for the given chain, aims to
select among a given, possibly large, number of 3D
structures (called decoys) those that are closer (according
to a given notion of distance) to the original (unknown)
one. Each decoy is represented by a set of points in three
dimensions. Existing methods for solving PSS make use of
suitably defined energy functions which heavily rely on the
primary structure of the protein and on protein chemistry.
We use a completely different approach to PSS which does
not take advantage at all of the knowledge of the primary
structure of the protein but only relies on the graph
theoretic properties of the decoys graphs (vertices
represent residues and edges represent pairs of amino
acids whose euclidean distance is less than or equal to a
fixed threshold: contact maps). Even if our methods only
rely on approximate geometric information, experimental
results show that some of the graph properties we adopt
score similarly to energy-based filtering functions in
selecting the best decoys.
Our results show the principal role of geometric
information in PSS, setting a new starting point, and
filtering method, for existing energy function based
techniques.