Homepage   general_banner
PDF Version      Genomic analysis

  Director : Ivan MOSZER (moszer@pasteur.fr)



The activities of the Technical Platform 4 "Annotation" are many: assembly and initial annotation of new genome sequences, curation of annotated data collections, software development of genomic databases, genome analysis and molecular phylogeny. A large fraction of this work is performed in the framework of collaborations with other groups of the Institut Pasteur: research units, Génopole, computing center, and teaching department.



Assembly and annotation of new genome sequences (L. Frangeul, B. Giletti, S. Mativet)

We are developing the CAAT-Box software ("Contig-Assembly and Annotation Tool-Box"), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase starting from the finishing step (to get further details about CAAT-Box, please see prior activity reports). Several collaborations with groups involved in large-scale sequencing projects are maintained through this software:

  • Collaboration with the Unité de Génétique Moléculaire des Levures (B. Dujon): This group has undertaken the complete genome sequencing of Candida glabrata, in the framework of a Génopole project. We performed the assembly of 180,000 sequences produced by both the Institut Pasteur Génopole (TP1, C. Bouchier) and the Évry Génopole, and thus obtained over 700 contigs. In collaboration with TP1, we corrected low quality regions of these contigs and developed several strategies to close the gaps. We obtained a hundred contigs (13 Mbases) almost spanning the entire 13 chromosomes. In collaboration with the Laboratoire Bordelais de Recherche en Informatique (Université Bordeaux 1, D. Sherman), CAAT-Box was used for the annotation of the main contigs.

  • Collaboration with the Unité des Staphylocoques (N. El Solh): Following the public release of the complete genome sequence from seven strains of Staphylococcus aureus (N315, Mu50, MW2, MRSA, MSSA, 8325, COL), we have undertaken an exhaustive comparison of the 14,000 CDS. This allowed us to build a list of significant CDS: diverging genes between strains, or genes potentially involved in the pathogenic nature of the bacterium. We then used CAAT-Box to determine the sequence of specific primers, in order to design a macro-array containing 600 CDS.

  • Transversal Research Program Pasteur-Lille (C. Locht): CAAT-Box was used to define all the primers required for the making up of a complete macro-array for Bordetella pertussis.

  • Collaboration with the Unité des Cyanobactéries (N. Tandeau de Marsac): CAAT-Box was used to follow up the first assembly steps of the genome of Microcystis aeruginosa.

Annotated genomic databases (C. Boursaux-Eude, J.-C. Camus, L. Hummel, A. Marcel, M. Pryor)

Two additional bacterial genomes (Streptococcus pneumoniae R6 and Tigr4) were integrated in a GenoList-like database, named StreptoPneumoList. Likewise, in the framework of a Transversal Research Program with the Institut Pasteur of Lille (C. Locht), the genomes of three organisms of the Bordetella group (B. pertussis, B. parapertussis, B. bronchiseptica) will be imported soon in new GenoList databases.

In order to facilitate the integration of bibliographical references in the GenoList databases, a new tool was developed (BiblioDB). It allows the user to import all the references related to a given organism (e.g. through an EndNote file), and to link each reference to the relevant genes, through an adapted interface. These relationships are then integrated to the corresponding GenoList database.

This tool was used in the context of the reannotation of mycobacterial genomes (M. tuberculosis, M. bovis, M. leprae), in collaboration with the Unité de Génétique Moléculaire Bactérienne (S. Cole). Concerning M. tuberculosis, over 80 new CDS were predicted, notably through the use of the AMIGA program, and all the genes were classified according to a new functional classification. Many gene functions were revised, according to in-depth analysis of sequence comparison data and careful survey of the recent literature (over 1,000 references). Proteomic data were also integrated into the TubercuList database.

GenoList: towards a new multi-genome version (S. Moreira, L. Hummel, A. Marcel, E. Quevillon)

GenoList databases and Web servers are up and running for over ten years and they are internationally recognized. They combine a user-friendly presentation of the data, an intuitive browsing model, and query or analysis tools tightly integrated in the application and closely linked to the data. Each server however cannot handle more than a single organism, except for closely related strains. We are in the process of developing a multi-genome version of GenoList. This required first to enhance the conceptual data model, especially by adding relationships between organisms. The user interface is being completely rebuilt as well, and it will integrate new multi-genome comparison tools (coll. A. Le Roch, Unité des Cyanobactéries). On the technical side, we have decided to use a new development environment, WebObjects, well suited to the development of three-tier applications, and allowing us to quickly and easily maintain large Web applications.

SubScript: a transcriptome database (S. Moreira)

We built a database for the storage and the analysis of transcriptome data, named SubScript. It allows the user to submit new transcriptome experiments obtained with micro- or macro-arrays. The data model was designed with the precise type of information needed in mind (MGED-compliant), and the submission interface was developed in a user-friendly way. This work was performed through a strong interaction with laboratories performing the practical experiments, especially the Génopole TP2 (J.-Y. Coppée, G. Lacourrège). Statistical analysis tools are being developed now (coll. L. Marisa, Unité de Génétique des Génomes Bactériens), by taking into account the intrinsic complexity of the data and the experiment schema. The application is developed using the WebObjects Web application server.

This project was initiated in the framework of the " BACELL Network " program for the functional analysis of Bacillus subtilis, in collaboration with the Unité de Génétique des Génomes Bactériens (A. Danchin). New collaborations have been set up with IP groups participating to other functional analysis projects:

  • The Unité de Programmation Moléculaire et Toxicologie Génétique (P. Quillardet) belongs to a French consortium for the functional analysis of Escherichia coli (P. Bouloc).

  • The Unité d'Immunologie Moléculaire des Parasites is participating to the Pal+ program for the study of the transcriptome of Plasmodium falciparum (P. David).

  • The Laboratoire de Génomique des Microorganismes Pathogènes (P. Glaser) is analyzing the transcriptome of Streptococcus agalactiae.

These collaborations will allow us to increase the data available in our database, thus opening new opportunities for the cross-comparison of metabolic regulations from closely related organisms.

Phylogeny (C. Dauga)

Strategies to highlight gene evolution processes were determined in order to choose the best approaches to extract phylogenetic information from gene sequences. This requires in-depth understanding of concepts used in phylogeny, and the knowledge of the restrictions of phylogenetic tree construction models.

These strategies are intended to alleviate biological problems encountered by many research groups in molecular identification studies or epidemiology of organisms:

  • Collaboration with J. Raymond (Hôpital Saint Vincent de Paul) and A. Labigne (Unité de Pathogénie Bactérienne des Muqueuses) for the phylogeny of genes evolving by genetic drift and genetic recombinations in order to understand the contamination mode of a family by Helicobacter pylori;

  • Collaboration with A.-B. Failloux (Unité d'Écologie des Systèmes Vectoriels) for the phylogenetic study of two variable genes in order to determine the geographic origin of recently imported strains of Aedes albopictus.

These strategies are also required to understand the evolutive history of genes, lateral transfers, genetic recombinations and duplications in genome exploration:

  • Collaboration with H. de Reuse (Unité de Pathogénie Bactérienne des Muqueuses) in order to determine the evolutive mode of genes coding for aliphatic amidases in the Helicobacter pylori genome.

A new project for the definition of phylogenetic strategies to detect lateral gene transfers between neighbouring species is now being developed.

Teaching activities (C. Boursaux-Eude, C. Dauga, L. Frangeul)

Teaching is one of the missions of PT4:

  • Setting up of an initiation to bioinformatics for the General Microbiology course;

  • Participation to the Genome Analysis course (data processing);

  • Co-organization of the workshop Inserm 135: Identification of non cultured bacteria in clinical and environmental setting;

  • Lecture and class work "Phylogenetic analysis, theoretical principles and practice" for Bioinformatics teaching of Paris 7 university.

Keywords: annotation, databases, genome, transcriptome, phylogeny

  web site

puce More informations on our web site


puce Publications of the unit on Pasteur's references database


  Office staff Researchers Scientific trainees Other personnel
  LUCHIER, Françoise,fluchier@pasteur.fr (partial time) DAUGA, Catherine, IP, Research Scientist,cdauga@pasteur.fr

MOSZER, Ivan, IP, Research Scientist,moszer@pasteur.fr
PRYOR, Melinda, Post-doc fellow,mpryor@pasteur.fr BOURSAUX-EUDE, Caroline, Engineer,cbx@pasteur.fr

DEHOUX, Pierre, Engineer,pdehoux@pasteur.fr

FRANGEUL, Lionel, Engineer,lfrangeu@pasteur.fr

HUMMEL, Laurence, Engineer,lhummel@pasteur.fr (fixed-term contract)

MARCEL, Anne, Engineer,amarcel@pasteur.fr (fixed-term contract)

MOREIRA, Sandrine, Engineer,moreira@pasteur.fr

Activity Reports 2002 - Institut Pasteur

Page Top research Institut Pasteur homepage

If you have problems with this Web page, please write to rescom@pasteur.fr