  MEMBERSDr BOURSAUX-EUDE Caroline / CAHUZAC Romain / Dr DAUGA Catherine / Dr DEHOUX Pierre / Dr FRANGEUL Lionel / HIRCHAUD Édouard / LECHAT Pierre / Dr MEURICE Guillaume / ROUSSEAU Sandrine / SAINT-MARTIN Françoise / TORRENT Jessica / VENTADOUR Pierre

Introduction. The activities of the Platform 4 (PF4) “Genomic Integration and Analysis” are divided into three main sections: assembly, annotation, and re-annotation of genome sequences; software development of genomic databases; genome analysis and evolutionary genomics. The working line we follow aims at the creation of an integrated environment for the exploration of the functioning and evolution of microbial genomes.

Genome annotation and analysis. We are developing the CAAT-Box software (“Contig-Assembly and Annotation Tool-Box”), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase before the completion of the full genome sequence (finishing stage). Several collaborations with groups involved in large-scale sequencing projects are maintained through this software, such as the genome sequencing project of the cyanobacterium Microcystis aeruginosa, and the annotation and analysis project on Klebsiella rhinoscleromatis. We are also involved in the annotation of several bacterial genomes like the saprophytic bacterium Leptospira biflexa, and an Helicobacter pylori strain involved in MALT lymphoma; in addition, we are participating in a project aimed at the re-sequencing, re-annotation and metabolic reconstruction of the genome sequence of Bacillus subtilis, and we have started a project on the re-annotation of Clostridium difficile.

Integrated specialized databases. We are developing two main database software. GenoList is an integrated environment dedicated to the query and analysis of genomic data from bacterial species. A new version under construction integrates genome data for more than 700 species from the Genome Reviews repository. A user interface was developed for querying and navigating the data, including sequence analysis and subtractive genome analysis tools. This allows the user to browse the data in a powerful and intuitive way. GenoScript is an integrated environment for transcriptome analysis, designed for collecting, storing, querying and analyzing DNA chips data. GenoScript allows the user to enter experiments and associated results, as well as to make requests and perform statistical analysis on the data. The interface of the database can be easily customized, and querying the data can be performed through an interface making it possible to build complex multicriteria queries.

Evolutionary genomics. We combine evolutionary and sequence analysis to provide reliable and sensitive tools to explore genome complexity. We built an automated computational method for the functional clustering of proteins on the basis of phylogenetic relationships – “PhyloClust”. This tool was implemented to subdivide large protein families and obtain a classification of related proteins for which we have no functional information. We are now identifying immunity lineages of Serine Proteases in 7 genomes of insects, in collaboration with entomologists. We are also developing comparative genomics approaches to identify proteins involved in host-pathogen interactions in Chlamydiaceae.We organized the repertoire of hypothetical proteins in 13 genomes according to their secondary structures, protein domains and evolutionary pressure, to propose new targets to cellular biologists. Lastly, we studied the usefulness of genetic markers and dynamic evolutionary tools to follow pathogen and vector populations, in collaboration with units of the Institut Pasteur and the international network.

