Genome Analysis and Integration

  MEMBERSDr BOURSAUX-EUDE Caroline / Dr DAUGA Catherine / Dr DEHOUX Pierre
Dr FRANGEUL Lionel / LECHAT Pierre / MOREIRA Sandrine / RADONDY Yoan

  Annual Report

Introduction. The activities of the Platform 4 (PF4) "Genomic Integration and Analysis" are divided into three main sections: assembly, annotation, and re-annotation of genome sequences; software development of genomic databases; genome analysis and molecular phylogeny. The working line we follow aims at the creation of an integrated environment for the exploration of the functioning and evolution of microbial genomes.

Genome annotation and analysis. We are developing the CAAT-Box software (“Contig-Assembly and Annotation Tool-Box”), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase before the completion of the full genome sequence (finishing stage). Several collaborations with groups involved in large-scale sequencing projects are maintained through this software, such as the Génolevures 2 consortium, for the annotation and analysis of yeast genomes. We are also working on the annotation of several bacterial genomes, like the saprophytic bacterium Leptospira biflexa, and we have contributed to the re-annotation of well-studied bacterial species (Mycobacterium tuberculosis, Bacillus subtilis). Finally we are working on the annotation and analysis of the Anopheles gambiae genome (main malaria vector).

Integrated specialized databases. We are developing two main database softwares. GenoList is an integrated environment dedicated to the query and analysis of genomic data from bacterial species. The current version integrates genome data from more than 100 species. A user interface was developed for querying and navigating the data, including sequence analysis and subtractive genome analysis tools. This allows the user to browse the data in a powerful and intuitive way. GenoScript is an integrated environment for transcriptome analysis, designed for collecting, storing, querying and analyzing DNA chips data. GenoScript allows the user to enter experiments and associated results, as well as to make requests and perform statistical analysis on the data. The interface of the database can be easily customized, and querying the data can be performed through an interface making it possible to build complex multicriteria queries.

Molecular phylogeny. Phylogenetic tree building methods provide reliable and sensitive tools to reconstruct evolutionary history of genomes. We used the ability of treelike processes to represent microbial relationships for delineating a new division of uncultivable bacteria, and characterizing strains of Enterobacter sakazakii from neonatal outbreaks. We also automated tests comparing tree topologies to detect genes acquired by transfer: by studying the evolutionary modes of Helicobacter pylori strains, we described routes of family infection. Finally, we built an automated procedure “PhyloClust” for dividing large families of genes; this allowed us to identify lineages of Serine Proteases involved in development and immunity of A. gambiae. We are now exploring the utility of genetic markers and dynamic evolutionary tools to follow populations of vector mosquitoes of Rift Valley Fever and Chickungunya.

  Web Site

More informations on our web site


Publications 2006 of the unit on Pasteur's references database

Activity Reports 2006 - Institut Pasteur
If you have problems with this Web page, please write to