Homepage bandeau_genéral

  Director : Nilges, Michael (nilges@pasteur.fr)



The structural bioinformatics research group was created in March 2001. Our research focuses on the relationship between sequence, three-dimensional structure, and function of proteins, using, among others, modelling techniques and molecular dynamics simulations. We also continue the development of software for automated NMR structure calculations, and are collaborating with structural genomics projects and the structure databases. We have established collaborations with experimental groups at the Institut Pasteur and elsewhere, for example, in atomic force spectroscopy, X-ray crystallography, and microscopy. Our work is aimed at helping in the interpretation of experimental data, and generating hypotheses that can be tested experimentally.



Structure prediction and sequence analysis

In addition to the projects described, we are involved in the analysis of sequences in the genome of Bacillus Anthracis (with the group of Michelle Mock), and we are modelling proteins that take part in the metabolism of sulphur in Bacillus subtilis (with the group of Antoine Danchin).

Analysis of the Fibronectin III domains of Titin: Interactions with Myosin

(Collaboration with Claudia Muhle, EMBL Heidelberg)

The sarcomer of striated muscle contains the thin filaments (based on actin) and the thick filaments (based on myosin). During muscle contraction, these two filaments act on each other to generate active tension. Passive tension of the stretched sarcomer is the result of the extension of the protein titin, the third type of filament. Titin is the largest protein known to date. One single polypeptide chain is attached at the Z-disk at one end and on the M-line at the other. The A-band region of Titin is mostly composed of repetitive structures (super-repeats) made up of Immunoglobulin (IG)-like and Fibronectin III (FnIII)-like domains.

The precise role of the FnIII domains is unclear, but they could present binding sites for titin to myosin. In order to understand the function of the FnIII domains of the A-band better, Michael Habeck has, in collaboration with Claudia Muhle-Goll at the EMBL, developed structural models of all 132 FnIII domains of Titin, using the sequence homology to eight FnIII-like domains with known three-dimensional structure.

After grouping the models according to their position in the super-repeat in the central region of the A-band, they analysed the domains with respect to the conservation of side-chains. They showed that the conserved residues form extended surface patches mostly on one side of the domains. Domains from outside the central regions show less surface conservation. These conserved surface residues could therefore function as attachment sites for other proteins.

The results of the modelling have suggested experimental studies. In collaboration with two other groups, the binding properties of fragments of Titin containing several FnIII-like domains were studied experimentally. The results indicate that the FnIII-containing fragments bind specifically to the S1 fragment of Myosin, and that they have an effect on the contraction of simple cardiac myocites. Finally, we have proposed a model showing how the FnIII-like domains of Titin could influence the interaction between Actin and Myosin during muscle contraction.

Prediction of protein complexes in Mycobacterium Tuberculosis

(Collaboration with Alfonso Valencia, Madrid, and Christos Ouzounis, Cambridge)

The formation of highly specific complexes between proteins is a condition for almost all biologically interesting processes. It is estimated that the 6000 proteins Saccharomyces cerevisiae, for example, take part in more than 30,000 physical interactions. The three-dimensional form of the proteins and their complexes is often the key for a detailed understanding of how they achieve their specific task. In spite of technological progress in the determination of three-dimensional structures of isolated proteins, it is more difficult to obtain the same information on complexes of proteins.

Recently, structural genomic projects have been launched, with the aim of obtaining the structures of 400 of the roughly 4000 proteins of Mycobacterium Tuberculosis (MTb). Exploiting the infrastructure generated for these projects, we want to try to develop fast methods to predict the structure of complexes, starting from a combination of experimental results and calculation techniques. First, a set of protein pairs needs to be defined. For this, we use the available experimental results and bioinformatics techniques, in collaboration with the groups of Alfonso Valencia (Madrid) and Christos Ouzounis (Cambridge). One possibility to predict interactions between proteins is by identifying gene fusion and fission events in a comparative analysis of several genomes (compare the HisF-HisH complex below, which is single protein in some organisms). Christos Ouzounis and his group have identified 150 protein-protein interactions in this way for the MTb genome. With a different approach, looking for correlated mutations in multiple sequence alignments of proteins, the group of Alfonso Valencia predicted 1500 such interactions. In addition, the method used by Valencia's group can predict specific residue-residue contacts. These methods clearly cannot predict all interactions in MTb, but they provide us with a list of probable complexes.

We will use the prediction data will be used as constraints for docking calculations. We are in the process of developing a new docking strategy that avoids the usual simplification to neglect protein flexibility. Benchmarks Johan Leckner in the group has carried out have demonstrated that even with small structural changes between free and complexed proteins may prevent a "rigid" docking program to identify the correct complex. Our approach to include flexibility is based on a molecular dynamics simulation method which was developed in the group and which facilitates structural transitions of the protein. A second source of constraints for the docking will be data from NMR experiments. Here, we will try to exploit exclusively data that can be quickly measured and analysed. For example, the formation of a complex between two proteins leads to changes in the chemical shifts. These changes help to identify the interface.

Raik Grünberg has started to build a database with all the experimentally known and predicted data relevant to the project (predicted and known protein-protein interactions, three-dimensional structures of proteins involved in the complexes or homologous structures from other organisms). The protein-protein interaction list will be filtered for pairs with properties favourable for experiment and calculation. Preference will be given to the modelling of those molecules of small size with known three-dimensional structure, or with a close homologue known of which the structure is known. We hope that we will be able to demonstrate that the information obtained from prediction and simple experiments, each of which would be insufficient alone to obtain a realistic model of the three-dimensional model of a complex, can be combined to achieve this goal. The structural genomic projects will rapidly augment the number of known protein structures. Efficient methods for the derivation of structures of macromolecular assemblies from theoretical considerations and experimental data will be in high demand.

Protein structures from experimental data

The program ARIA we have developed over the last few years is one of the very few programs for automated assignment of NMR spectra to be used in practice. It has been used in the determination of about 50 NMR structures in the PDB, and is used in approximately 150 laboratories worldwide. It plays a central role in two international collaborations on software development for NMR structure determination: CCPN (Collaborative Computing Project for NMR, headed by Ernest Laue, University of Cambridge), an initiative to standardize and automate the data processing and structure determination process, and NMRQUAL, an EU-supported project to develop new methods to determine structures and evaluate their quality (collaboration with Gerd Vriend, University of Nijmegen, The Netherlands; Robert Kaptein, University of Utrecht, The Netherlands, Ernest Laue, University of Cambridge, and John Ionnides, EMBL Hinxton). In addition to the automation in the spectra analysis offered by ARIA, the issues of automation, quality assessment and database deposition addressed in these projects are of central importance for structural genomics projects.

In several collaborations, we are using ARIA for assignment of spectra and the modelling of structures, for example with the groups of Prof. Heinz Rüterjans (University of Frankfurt, Germany), Michael Sattler (EMBL Heidelberg), and Murielle Delepierre at the Institut Pasteur.

New algorithmic developments

ARIA accelerates the assignment of the most important experimental data for structural work with NMR, the NOE by the use of so-called ambiguous distance restraints in an iterative structure calculation strategy. Michael Habeck, Wolfgang Rieping, and Jens Linge work on a completely new version of ARIA. The new version will contain a fast novel algorithm for torsion angle dynamics, an algorithm for the correction of so-called spin diffusion, and methods for the validation of NMR data and calculated structures. In addition, ARIA will contain an interface to provide direct access to the BioMagResBank (BMRB) and to the CCPN project. Internally, we use XML (the eXtensible Markup Language) to encode the data, for example, the lists of chemical shifts and NOE peak lists.

We are also working on a new concept to model the NMR structures, based on Bayes theorem. The problem of the calculation of a structure of a protein cannot be solved in a unique way, for the following reasons: first, the experimental data always contain errors; second, the experiments do not supply sufficient information to allow a complete determination of the structure; third, the mathematical models describing the relationship between the measured quantities and the three-dimensional structure of the molecule are approximate and incomplete.

The absence of a unique solution of the structure determination problem requires a probabilistic description. The theory of Bayes supplies the only objective framework for problems where the information is incomplete. In the present context, some of the most important applications are:

  1. An evaluation of the experimental data concerning their quality
  2. The determination of the most probable structure
  3. The derivation of local measures of uncertainty
  4. The identification of structural regions that are not well defined by the data

Michael Habeck and Wolfgang Rieping are in the process of developing a new structure calculation algorithm, which uses a combination of torsion angle dynamics and the Monte-Carlo method. The method has so far been tested with small proteins with artificial restraints, with encouraging results.

Molecular dynamics

Proteins need to move to exercise their functions in the cell. Experimental protein 3D structures, in particular from X-ray crystallography, usually give us only a single conformation. There is no experimental technique available to observe protein motion at atomic detail. Therefore, we need to resort to theoretical techniques to understand protein motions and hence function. Molecular dynamics simulations can give us important information not only on the dynamics of the molecule on an atomic scale, but also on the internal forces acting within the molecule or between molecules.

We use the collection of molecular dynamics programs AMBER for all our simulation projects. In order to render the calculations more efficient, the interaction of the molecules with water is calculated in a simplified manner. AMBER provides a physically reasonable correction for the solvent that is not treated explicitly. While this method does not give all the details of a simulation with full detail, a quantitative analysis of the results is possible. The results are promising - our calculations show, for example, that the calculations are sensitive enough to see the effects of point mutations.

Simulation of forced unfolding of a Spectrin repeat

(Collaboration with the groups of Matti Saraste, structural and computational biology programme, EMBL Heidelberg; Heinrich Hörber, cellular biophysics programme, EMBL Heidelberg; and Pierre-François Lenne, Université de Marseille)

In red blood cells, Spectrin is the major component of the cellular membrane skeleton - a system of proteins linked together under the membrane. This system conserves the form of the eukaryotic cell and provides at the same time a high degree of elasticity. Red blood cells are under large mechanical stress, and many haemolytic diseases are caused by mutations in Spectrin. The protein has probably evolved from alpha-actinin and is composed of two heterodimeric proteins, which assemble as a tetramer in a rod-like structure. The alpha and beta monomers are principally composed of multiple repeats of the "spectrin repeat" domain (20 and 16, respectively, in mammalian erythrocytes). Heinrich Hörber's group at the EMBL in Heidelberg studies the unfolding of the spectrin repeat with atomic force spectroscopy. Surprisingly, and in contrast to other domains studied with the same method, their results suggest a half-extended intermediate, which can be adopted by the molecule in response to the external force.

The experiments cannot give any direct information on the nature and structure of the intermediate. Our molecular dynamics simulations of the forced unfolding of Spectrin try to illustrate the experimental results at atomic detail. In order to make the calculations sufficiently fast, we had to simulate the unfolding at a much faster rate than in the experiment. In spite of this limitation, we hope that we can identify the unfolding steps and most important intramolecular interactions along the unfolding pathway.

Protein dynamics is chaotic, and one cannot expect that the protein unfold exactly the same way every time. We therefore have to simulate the unfolding several times before we can draw any conclusions. Raik Grünberg has calculated and analysed 11 unfolding trajectories of one Spectrin repeat. In about half of the trajectories, an interesting re-orientation of the helices leads to compact and mechanically stable structures. These intermediates agree well with the experimental results from the atomic force spectroscopy. Their formation depends on the local unfolding of one of the helices at a specific point. Based on preliminary results from our simulations, we had designed a mutant in which a stabilization of the helix would be expected to prevent breaking of the helix. The peak due to the intermediate indeed disappeared in atomic force spectra for this mutant. We performed calculations also for the mutant, and in agreement with experiment we observed that the force-resistant intermediates disappeared. We are therefore confident that our simulations represent a realistic model of the unfolding of a Spectrin repeat.

From the results one can conclude that the molecule has evolved to react in several stages to external unfolding forces. In our simulations, we observe a variety of unfolding pathways, which, in contrast to simulations performed on other molecules, involve non-native topologies. This observation has important implications for the understanding of the experimental results. Up to now, the interpretation of atomic force spectroscopy experiments was usually based on the hypothesis of an all-or-nothing event. Our simulations will help us not only to understand the mechanic properties of Spectrin, but also to find new interpretations for experimental results.

Ligand influence on the dynamics of a protein: Ran-GTP versus Ran-Gdp

(Collaboration with the group of Philippe Bastiaen, cellular biophysics programme, EMBL Heidelberg)

The GTPase Ran belongs to the Ras super-family of proteins. It is implicated in many cellular activities, such as the cell cycle, DNA replication, chromosome organisation, and the maturation of RNA. Two activities of Ran are particularly well studied: on the one hand, its role for nucleoplasmic transport of proteins (schematically, RanGDP versus RanGTP in the nucleus) and on the other hand its implication in the nucleation and the development of microtubules during mitosis.

Philippe Bastiaen's group at the EMBL studies the dynamics of the interaction between proteins implicated in signal transduction in the cell. The group has developed new optical microscopy techniques that allow them to follow in real time the fate of a labelled protein in the cell in which it is expressed. For her study of the influence of Ran on the assembly of microtubules, Gertrude Blunt in Philippe Bastiaen's group is in the process of transfecting cells with a mutant of Ran coupled to a special mutant of GFP (Green Fluorescent Protein), in order to follow in vivo the state of Ran in different cellular compartments. It is instrumental for the project that the insertion of GFP alters neither the capacity of Ran to bind to nucleotides nor its interaction with other proteins. The GFP mutant used in the experiments is sensitive to the dynamical properties of the region of the protein into which it is inserted. In order to use GFP as a biological sensor in order to distinguish Ran.GDP from Ran.GTP, it was necessary to identify the regions of Ran that show different dynamics in the two states.

X-ray crystal structures for both states of Ran are available (at 2.3 Å resolution for Ran.GDP and 2.9 Å resolution for Ran.GppNHP, an analogue of GTP that cannot be hydrolysed). The information on dynamics present in the X-ray crystal structures (the temperature factors) could not be used to define these regions, mostly due to the limited quality of one of the structures. We have therefore proposed to supplement this information by theoretical means and to determine the regions of Ran that show significant dynamical behaviour between the two states from molecular dynamics simulations. Nathalie Duclert-Savatier and Raik Grünberg have calculated trajectories of Ran bound to the two nucleotides, starting from the two crystal structures. We have used 750 ps of dynamics to characterize the differences in the fluctuations of the atoms around their average positions, and could in this way identify regions for an insertion of GFP. The regions we have proposed take into account the additional constraints that neither the structure of Ran nor its binding properties should be significantly modified, and thus are distant from the switch I and switch II regions. These regions are very conserved in the Ras super family; the switch I region makes large conformational changes depending of the bound nucleotide, and the switch II region takes part in interactions with other proteins. The first mutations based on our suggestions are being introduced.

Transport of NH3 in Imidazole Glycerol Phosphate Synthase

(Collaboration with the group of Matthias Wilmanns, EMBL Hamburg, and Hannes Ponstingl, EBI Hinxton)

The biosynthesis pathway of histidine starts with the condensation of ATP and PRPP (5-phosphoribosyl 1-pyrophosphate) and requires 11 enzymatic reactions. The bi-enzyme complex of glutaminase-synthase, or IGP (Imidazole Glycerol Phosphate) synthase, comprises the two subunits HisH and HisF. The subunit HisH is the glutaminase, which hydrolyses the glutamine to glutamate and ammoniac. The NH3 is then transported, without hydrolysis, to the active site of the synthase HisF, which adds it to its substrate PRFAR (N'-[(5'-Phosphoribulosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide), to produce ImGp (Imidazole glycerol phosphate) and AICAR (5-aminoimidazole-4-carboxamide ribonucleotide). The transport of NH3 and the coordination of the catalytic activities over a large distance are very interesting biological processes. The enzyme is also a potential target for drug development since the histidine pathway does not occur in mammals.

The group of Matthias Wilmanns at the EMBL has recently determined the X-ray crystal structure of the IGP-synthase complex from the thermophile organism Thermatiga maritima. We have started collaborating in order to analyse the dynamic behaviour of the HisF subunit alone and in complex with HisH. The aim is to study NH3 transport and the communication between the active sites by molecular dynamic simulations. The results of the simulation will be helpful to understand the results of biochemical experiments in which the role of different residues had been studied.

Together with Hannes Ponstingl at the EBI in Hinxton, Nathalie Duclert-Savatier has calculated several molecular dynamics trajectories of the synthase subunit HisF, alone and in complex with the glutaminase subunit HisH. She is now analysing the trajectories. Of particular interest is the influence of the HisH subunit on the dynamics of HisF, since this could give a hint to understand the coordination of the reactions. The next step is now to place the NH3 in the entry of the putative pathway through HisF to its active site. The structure of HisF is very suggestive: it is a so-called TIM-barrel, with the active site opposite to the HisH interface. In contrast to other TIM-barrel proteins, the central part of the barrel appears open, forming a channel large enough for the NH3 molecule to pass through. Since we cannot hope that the NH3 will pass spontaneously through the channel in the calculations, we will use an appropriate force, similar in spirit as in the forced unfolding calculation of the Spectrin repeat described above.

Building up the computing environment of the structural bioinformatics group

Tru Huynh has been occupied with building up the computer environment of the unité. Each user in the group has her or his computer (PC under Linux or Macintosh). A file server with a capacity of 500 GB was installed for the whole bioinformatics centre to simplify the administration of the user directories. The disks on the server are backed-up every day by the central backup robot of the institute in the SIS. The computing needs of the unité are served by a PC cluster that is in the process of being built up, which at present comprises 8 nodes connected with high-speed Ethernet (at 100Mbits/sec) of high latency (100 microseconds). We plan to supplement this early 2002 with 20 further bi-processor nodes or 40 single-processor nodes. We have chosen this architecture for the following reasons:

  1. Every processor has approximately 40% performance of a bi-processor Compaq, which are used in the Plateau Technique Annotation (PTA)
  2. The cost is 20 times smaller
  3. The applications in the group are either trivially parallel (homology modelling and structure calculation from NMR data typically require many identical calculations), or the time of communication is small compared to the calculation time (such as molecular dynamics calculations with AMBER or CHARMM). For the latter, we observed speedups around 3.2 for 4 processors or 6 for 8 processors.

For molecular dynamics calculations with AMBER, a user has thus a peak performance of approximately three times that of a server at the PTA, at half the cost. Obviously, it has to be noted that the principal limitation of the architecture is that the code has to be parallelized or parallelizable.

In addition, Tru Huynh has helped in the installation and administration of computers in other services or groups: computers for the teaching lab, the Plateau Technique for DNA chips, the Plateau Technique for sequence annotation, the Institut Pasteur collection, and the unité of Structural Biochemistry. Also, he has started technical collaborations with the provider and constructor of the PCs.


Scientific collaborations

Institut Pasteur

Pedro Alzari

Antoine Danchin

Michèle Mock

Nicolas Wolff, Muriel Delepierre


Thérèse Malliavin, Institut de Biologie Physico-Chimique, Paris

Marc-André Delsuc, Centre de Biochimie Structurale, Montpellier

Jean-François Lenne, Institut Fresnel, Marseille



Felician Dancea, Heinz Rüterjans, University of Frankfurt, Germany

Heinrich Hörber, EMBL Heidelberg, Germany

Hartmut Oschkinat, FMP Berlin, Germany

Michael Sattler, EMBL Heidelberg, Germany

Matthias Wilmanns. EMBL Hamburg, Germany

Roger Abseher, Boehringer-Ingelheim Vienna, Austria

Alfonso Valencia, Madrid, Spain

Jarri Ylanen, University of Oulu, Finland

Robert Kaptein, University of Utrecht, The Netherlands

Chris Spronk, Gert Vriend, University of Nijmegen, The Netherlands

Cornelis Hilbers, University of Nijmegen, The Netherlands

Rasmus Fogh, Ernest Laue, University of Cambridge, United Kingdom

Christos Ouzounis, EBI Hinxton, United Kingdom

Vim Vranken, John Ionnides, EBI Hinxton, United Kingdom

Jens Meiler, David Baker, University of Washington, USA

Jurgen Doreleijers, Eldon Ulrich, BioMagResBank, Madison, USA

John Westbrook, RCSB, USA



  web site

puce More informations on our web site


puce Publications of the unit on Pasteur's references database


  Office staff Researchers Scientific trainees Other personnel

Linge, Jens, post-doc

Leckner, Johan, post-doc

Grünberg, Raik, PhD student

Habeck, Michael, PhD student

Rieping, Wolfgang, PhD student

Huynh, Tru-Quang, engineer

Duclert-Savatier, Nathalie, engineer


Page Top research Institut Pasteur homepage

If you have problems with this Web page, please write to rescom@pasteur.fr.