Homepage bandeau_genéral

  Director : par intérim, Cole, Stewart (stcole@pasteur.fr)



The Technical Laboratory 4 (PT4) belongs to the Genopole Institut Pasteur. This unit was established in 2000 and has steadily grown during 2001. The missions of this laboratory are diverse but it is however primarily concerned with genome analysis. The major parts of its activities involve collaborations with research laboratories, technical platforms, the computing team and with the teaching department.



Collaboration with the laboratory: "Genomics of Pathogenic Micro-organisms" managed by Frank Kunst (Boursaux-Eude, C. and Frangeul, L.)

We have developed a software package that had been used in the sequencing of the genome of Listeria monocytogenes obtained by a shotgun strategy. We were involved in the supervision and in the finishing of the complete genome (2.9 Mb). This package is called "CAAT-box" (Contig Assembly and Annotation Tool Box) and is being used by the GMP laboratory (Institut Pasteur) for the following sequencing projects: Listeria innocua, Photorhabdus luminescens and Streptococcus agalactiae.

The "CAAT-box" contains several modules that can be divided into four categories:

  • With the first part, we can follow the evolution of the shotgun project by comparing the results from previous and current assemblies.

  • The second part allows the predictions of links between contigs during the finishing assembly phase.

  • The third part enables the annotation of the genome to start while the assembly process is still being completed. Individual proteins or IPF are used.

  • The goal of the last part is to enable the annotations to be accessed and edited via the Internet. The clients are permitted to read, annotate and modify each IPF and also to access related results that are generated by different programs like BLAST, Genemark or Toppred2. Each modification or annotation done by a client is immediately accessible to all the other annotators.

Three new modules were created during 2001 to improve the "CAAT-Box":

  • One module for the automatic detection of primers for the study of the transcriptome,

  • Another module for the automatic detection of potential frameshifts,

  • A third module to import data from FASTA databanks into the IPF.

Collaboration with the laboratory: "Genomics of the bacterial genomes" managed by Antoine Danchin (Boursaux-Eude, C., Frangeul, L., Marcel, A. and Quevillon, E.)

< p align="JUSTIFY">" The CAAT-box " is currently being used to follow the progression of the assembly phase in the genome of Penicillium marneffei. This project is a collaboration with Antoine Danchin and the " HKU-Pasteur Research Centre " in Hong-Kong.

Another work consists in setting up new databases, using the GenoList model (collaboration with Ivan Moszer). Initially, two strains of Staphylococcus aureus (N315 and Mu50) were chosen to create a new database called AureOList. This allows, similarly to the database PyloriGene, to research the two strains separately or jointly. Firstly, it is necessary to recover the most recent EMBL or GenBank file, "correctly" annotated on the considered organism. Secondly, a list of characteristics for the various fields (gene, note, product....) must be decided before the parsing. The following stage is the most significant because it consists in adapting a parsing program (written in Perl), in order to treat correctly each data file, according to these characteristics. The output of this program contains all information under a special format (SQL) which can complement some of the empty tables of the database. Not all the tables are complete as the EMBL or GenBank files do not contain all the necessary information. This is the case for the isoelectric points, the molecular weights, the essential files for the BLAST and FASTA, the functional classification and the establishment of the correspondences between each CDS of each strain. Finally, some modifications are required for the interface including help files, logo and maps. AureOList is currently functional, but not yet public (http://genolist.pasteur.fr/AureoList/), however the connections between the CDS of the two strains need to be completed.

Other micro-organisms are likely to be integrated into similar databases. A new strategy concerning the parsing of files is currently being studied, for the creation of a multi-organisms database.

A part of our team, in collaboration with Ivan Moszer, is also involved in the construction of a multi-genome database available via the Internet. For this to occur, we have modified the structure and the interface of the GenoList database (http://genolist.pasteur.fr). The structure remains in preparation. We are testing a universal parser for GenBank files that will be compatible with the structure of our databases. We will use another language (Java) to implement the database and the software WebObjects will be used to construct the Web interface.

Collaboration with the laboratory: "Molecular and Bacterial Genetics" managed by Stewart Cole (Camus, J.C. and Pryor-Stinear, M.)

Mycobacterium tuberculosis is one of the most serious contagious diseases, being a major world-wide human pathogen. In 1998, the Genetic Molecular Bacteriology unit at the Institut Pasteur took the initiative to sequence and annotate the complete genome of the virulent strain H37Rv of M. tuberculosis. The sequencing project was carried out in collaboration with the Sanger Centre (Hinxton, England). The information from this project was incorporated into a database called TubercuList (http//www.genolist.pasteur.fr/TubercuList/), which has been made available to other researchers via the Internet. Given the constant evolution of available scientific information, it seemed important to update TubercuList. Therefore we have undertaken the systematic re-annotation of all the ORF in the genome of M. tuberculosis. We have re-organised the layout of the annotation to include new scientific information (E.C. number, principal scientific references, functions of the putative proteins, etc…). New ORF, which have been identified since the original annotation, have been added. We have also incorporated, in the new annotation, numerous genomic comparisons between M. tuberculosis and Mycobacterium leprae. These comparisons are now possible due to the completion of the sequence and annotation of M. leprae by the above group. We have also started to create links between certain proteins in TubercuList and the scientific literature. This work has been undertaken in collaboration with the Wellcome Trust Genome Campus (Hinxton,England) and the Centre INFOBIOGEN (Genopole, Evry, France).

Collaboration with the laboratory: "Microbiology and Environment" managed in the interim by Pierre Béguin (Frangeul, L.)

Standford University has made available the sequence of the genome of Candida albicans (1213 contigs for 17 Mb). The "CAAT-Box" was used by the European Consortium to annotate these contigs. This annotation of more than 14000 ORF has been coordinated by Christophe d'Enfert and has recently been completed.

Collaboration with the laboratory: "Cyanobacteria" managed by Nicole Tandeau de Marsac (Frangeul L.)

" The CAAT-Box " is currently being used to follow the progression of the assembly phase in the genome of Microcystis aeruginosa.


Collaboration with the laboratory: "Molecular and Medical Bacteriology" managed by Guy Baranton (Frangeul, L.)

More recently, the "CAAT-Box" has been made available to the team of Isabelle Saint Girons to access both chromosomes of Leptospira interrogans and to improve the annotations.

Collaboration with PT1 Genomics (Genopole) managed by Christiane Bouchier and the group Softwares and Databases managed by Bernard Caudron (Boursaux-Eude, C. and Frangeul, L.)

We have developed several programs called "Chrosort", "CheckCap" and "runsite".

"Chrosort" and "CheckCap" enable the management of sequence chromatographs produced by the sequencing team (Genopole, PT1 Genomics, Institut Pasteur). "Chrosort" renames the chromatographs, sorts them according to their quality and saves the scores for each capillary on all the sequencers in the laboratory. "CheckCap" rereads the results obtained from "Chrosort" to detect a capillary with an average score significantly lower than the others. The group can then determine the reason for the poor results and take appropriate actions, for example better washing of the capillaries.

To assist PT1, we have also written a program called "runsite". This can be used in conjunction with the other programs Phred and Phrap as an integrated cascade prp (PhrepRunsitePhrap). Phred generates two files: the sequence file in FASTA format and the quality file. "runsite" uses the sequence file to search for the start or end of a particular sequence. If the sequence is not found, the quality file is consulted. "runsite" studies the scores from the quality file and depending on the choice of the user, poor quality sequence is removed.

Teaching activities (Boursaux-Eude, C. and Frangeul, L.)

Teaching is also one of the missions of the PT4 laboratory. We have participated in the preparation of the bio-computing parts of the "Cours de Microbiologie Générale" and of the "Cours d'Analyse des Genomes" (Institut Pasteur).

  web site

puce More informations on our web site


puce Publications of the unit on Pasteur's references database


  Office staff Researchers Scientific trainees Other personnel

Giletti, Benjamin, bgilett@pasteur.fr

Pryor-Stinear, Melinda, post-doctoral fellowship, mpryor@pasteur.fr

Boursaux-Eude, Caroline, bio-computing engineer, cbx@pasteur.fr

Camus, Jean-Christophe, bio-computing engineer, jccamus@pasteur.fr

Frangeul, Lionel, scientific engineer, lfrangeu@pasteur.fr

Marcel, Anne, bio-computing engineer, amarcel@pasteur.fr

Quevillon, Emmanuel, bio-computing engineer, tuco@pasteur.fr


Page Top research Institut Pasteur homepage

If you have problems with this Web page, please write to rescom@pasteur.fr.