Homepage   general_banner
PDF Version      Structural Bioinformatics

  Director : Nilges, Michael (nilges@pasteur.fr)



Bioinformatics derives knowledge from computer analysis of biological data. Structural bioinformatics is concerned with computational approaches to predict and analyse the spatial structure of proteins and nucleic acids.

A deeper understanding of many properties of proteins cannot be gained from their primary sequence but necessitates the knowledge of their three-dimensional structure. This is the major driving force behind structural genomics efforts, which can thus be seen as part of functional genomics. Our own structural genomics efforts concentrate on development of computational approaches to speed up and automate the analysis of NMR data in order to obtain three-dimensional structures.

We use molecular modelling techniques to predict protein structure and to analyse protein function. For example, we study protein-protein interactions predicted on the sequence level in more detail on the structure level. Functionally important motions of proteins form another important example. Experimental protein structures usually give us only a single conformation. We study large conformational changes that proteins may undergo due to forces (for example, during the formation of a complex, or in atomic force spectroscopy experiments). We also use molecular dynamics (MD) simulations to get a detailed understanding of the interaction of proteins with their ligands.



Predicting protein structure and protein interactions

Molecular modelling is an irreplaceable tool to obtain structural information in the absence of direct experimental information. It is most successful if the 3D structure of a homologous protein is known (homology modelling, comparative modelling). Rachid Maroun and other members of the group have started to implement a homology modelling pipeline, which we hope will be useful in the annotation of genomes sequenced at the Institut Pasteur. We are involved in molecular modelling projects with several groups at the Institut Pasteur

Prediction of protein-protein interactions

The formation of highly specific complexes between proteins is a condition for almost all biologically interesting processes. The three-dimensional form of the proteins and their complexes is often the key for a detailed understanding of how they achieve their specific task. In spite of technological progress in the determination of three-dimensional structures of isolated proteins, it is more difficult to get the same information on complexes of proteins. We therefore try to obtain detailed structural information on protein complexes from predictions or qualitative experimental data by molecular modelling. So far, we plan to use predictions of protein interactions made by comparative genome analysis by the groups of Alfonso Valencia (Madrid) and Christos Ouzounis (Cambridge) for the genome of MTb.

For the modelling of the complexes, we are developing a new docking strategy that avoids the usual rigid body simplification. This is important since even small structural changes between free and complexed proteins may prevent a rigid docking program to identify the correct complex. Our approach is to first generate an ensemble of structures with a molecular dynamics simulation method we developed (PCR-MD; see below).

The whole project is not aimed at modelling one or a few complexes, but as many as possible in a mostly automated way. We have therefore spent the last year principally to optimize and automate all the necessary steps:

• 3D structures from the protein data bank are pre-processed automatically for molecular dynamics and docking calculations;

• We have evaluated several docking programs with a standard benchmark of known complexes;

• We have implemented and tested new scoring schemes based on free energy estimates from molecular dynamics simulations of the complexes;

• We have further developed and speeded up the PCR-MD method to sample the conformational space around the free proteins.

• We are working on a systematic investigation of the dynamic behaviour of the binding interfaces; the aim is to identify binding interfaces from their dynamic behaviour.

The "pipeline" is working satisfactory now. In parallel, we plan to exploit the similarity of the predicted contacts with NMR data. The modelling can therefore also be done with ARIA (see below). For validation of the complexes, collaborations are planned with experimental groups (NMR and calorimetry).

In parallel, Raik Grünberg in the group has developed a new way to organize all the knowledge about the protein sequences we want to model, using the semantic web concept. The program allows to link information from different data bases in an easy and flexible way.

Molecular dynamics and protein function

Molecular dynamics simulations can show the motion in atomic detail, and therefore supplement the experimental information of the 3D coordinates. The disadvantage is that even on powerful computers the simulations are very time consuming. This is serious limitation since many biologically relevant motions take place on long timescales and have large amplitudes. We use a combination of approaches to overcome this limitation:

• We simplify the interaction of the molecules with the surrounding water, for example by using the so-called Generalized Born approximation;

• We use artificial external forces to induce certain motions of interest. This approach has been very successful in combination with the Generalized Born approximation — calculations conducted by Raik Grünberg (Altmann et al., Structure Fold Des 2002 ; 10:1085-1096) of forced unfolding of a coiled-coil domain from the protein Spectrin showed, for example, that the results are sensitive to point mutations, in qualitative agreement with experiment. Sabrina Serin has recently successfully extended the study to an all-beta-sheet domain of the muscle protein Titin as part of her traineeship in our group;

• We have developed a new simulation method (termed principal component restraint-MD, or PCR-MD). The method enhances overall correlated backbone motions, which have very low frequencies and are therefore difficult to observe in molecular dynamics simulations.

Transport of NH3 in Imidazole Glycerol Phosphate Synthase

The biosynthesis pathway of histidine starts with the condensation of ATP and PRPP (5-phosphoribosyl 1-pyrophosphate) and requires 11 enzymatic reactions. The bi-enzyme complex of glutaminase-synthase, or IGP (Imidazole Glycerol Phosphate) synthase, comprises the two subunits HisH and HisF. The subunit HisH is the glutaminase, which hydrolyses the glutamine to glutamate and ammoniac. The NH3 is then transported, without hydrolysis, to the active site of the synthase HisF, which adds it to its substrate PRFAR (N'-[(5'-Phosphoribulosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide), to produce ImGp (Imidazole glycerol phosphate) and AICAR (5-aminoimidazole-4-carboxamide ribonucleotide). The transport of NH3 and the coordination of the catalytic activities over a large distance are very interesting biological processes. The enzyme is also a potential target for drug development since the histidine pathway does not occur in mammals.

Starting from the X-ray crystal structure of the IGP-synthase complex from the thermophile organism Thermatiga maritime determined by the group of Matthias Wilmanns at the EMBL in Hamburg, Germany, we are analysing the dynamic behaviour of HisF and the HisH-HisF complex by molecular dynamic simulations. Of particular interest are the pathway of NH3 through HisF to its active site, and the coordination of the reactions of HisF and HisH. In order to study the passage of NH3 through HisF, Nathalie Duclert-Savatier used steered MD calculations, with an appropriate force to pull NH3 through the putative pathway (a central channel in the TIM-barrel fold of HisF). She could establish that the only energetic barrier for the passage is the so-called charge gate, consisting of four amino acids at the entry of the channel (two Glutamates, one Arginine, and one Lysine residue; see figure). In the calculations, the gate opens, reacting to the push of the NH3 molecule, usually by a side-way movement of the Lysine residue.

To study the spontaneous opening and closing of the gate, we use our PCR-MD method. In these simulations, the opening and closing is not caused by forcing the ligand through the gate, but by enhancing overall backbone motions. The simulations will be helpful to understand biochemical experiments in which the role of different residues had been studied.

Modelling and structural genomics by NMR

Automated structure refinement with ARIA

Our program ARIA is one of the very few programs for automated analysis of NMR data of biological macromolecules used in practice. It has helped in the determination of about 50 NMR structures in the PDB, and is used in more than 150 laboratories worldwide. ARIA accelerates the assignment of the most important experimental data for structural work with NMR, the NOE, by the use of so-called ambiguous distance restraints in an iterative structure calculation strategy.

In several collaborations, we are using ARIA for the assignment of spectra and the modelling of structures, for example with the groups of Hartmut Oschkinat (Institute for molecular pharmacology, Berlin, Germany), Michael Sattler (EMBL Heidelberg), and Murielle Delepierre and Veronique Stoven at the Institut Pasteur.

We have recently released version 1.2 of the software. Michael Habeck, Wolfgang Rieping, and Jens Linge have almost finished a complete re-write of ARIA, which will facilitate future developments. We are working on further development of ARIA also for other applications (homology modelling, docking).

In the past year, we have implemented and tested new functionality of ARIA:

• We have contributed to CCPN (Collaborative Computing Project for NMR, headed by Ernest Laue, University of Cambridge), an initiative to standardize and automate the data processing and structure determination process;

• In the NMRQUAL project, an EU-supported project (collaboration with Gerd Vriend, University of Nijmegen, The Netherlands; Robert Kaptein, University of Utrecht, The Netherlands, Ernest Laue, University of Cambridge, and John Ionnides, EMBL Hinxton), we are adding capabilities to ARIA to evaluate protein structure quality;

• In collaboration with Mark Williamson (UCL London) and Alexandre Bonvin (University of Utrecht) we have further developed the force field used in NMR structure refinement with ARIA, and showed that a very short refinement of NMR structures with few water molecules gives similar results as much slower refinement schemes.

• In collaboration with Michele Fossi (on visit from the Institute for molecular pharmacology, Berlin) we have implemented and tested a new method to distinguish noise from real data during the structure calculation;

• An interface to the BioMagResBank (BMRB) simplifies the deposition of data to the data-base;

• Julie Foch (a stagiaire from Paris VI) developed a new user interface to ARIA, based on Java and XML;

• Internally, we use XML (the eXtensible Markup Language) to encode the data, for example, the lists of chemical shifts and NOE peak lists.

Inferential structure determination

In a parallel effort, we are developing an entirely new concept to model the NMR structures, based on Bayes' theorem. Structure determination from NMR data is an inference problem: the measured quantities are noisy and mostly incomplete and therefore insufficient to determine the structure uniquely. The objective of structure determination is to explore all regions of conformational space compatible with the incomplete experimental information at hand. This is achieved by Bayes' Theorem; however, although its use for experimental structure determination was suggested already years ago, no implementation of this concept exists. One difficulty is that one needs to evaluate the "complete conformational space" in order to apply the theorem, which is, for biological macromolecules, enormous.

Michael Habeck and Wolfgang Rieping have now implemented the concept for structure determination of macromolecules from NMR data. To sample the conformational space, they combined several calculation methods from theoretical statistical physics (a combined Hybrid-Monte-Carlo/Gibbs-Sampler is used with a parallel Replica-Monte-Carlo scheme). This runs very efficiently on the PC cluster of the group.

There are many benefits in using a rigorous statistical approach to structure determination. Since the various sources leading to uncertainties in structural coordinates are modelled explicitly, statistically meaningful variances and correlations of atomic coordinates can be calculated. Quality and consistency of the experimental data is assessed by means of confidence intervals. In the probabilistic framework, all parameters that need to be estimated (for example, scale factors and weights) are given a strict and intelligible interpretation.

Keywords: Protein structure, protein function, protein dynamics, molecular recognition, sequence analysis

  web site

puce More informations on our web site


puce Publications of the unit on Pasteur's references database


  Office staff Researchers Scientific trainees Other personnel
    Maroun, Rachid, INSERM, (CR1,maroun@pasteur.fr)

Chau, Pak-Lee, IP, (CR, pc104@pasteur.fr)

Linge, Jens, post-doc

Leckner, Johan, post-doc

Grünberg, Raik, PhD student

Habeck, Michael, PhD student

Rieping, Wolfgang, PhD student

Fossi, Michele (January-August)

Foch, Julie, student (April-September)

Serin, Sabrina (February-March)
Huynh, Tru-Quang, engineer

Duclert-Savatier, Nathalie, engineer

Activity Reports 2002 - Institut Pasteur

Page Top research Institut Pasteur homepage

If you have problems with this Web page, please write to rescom@pasteur.fr