Introduction. The activities of the Platform 4 (PF4) “Genomic Integration and Analysis” are divided into three main sections: assembly, annotation, and re-annotation of genome sequences; software development of genomic databases; genome analysis and evolutionary genomics. The working line we follow aims at the creation of an integrated environment for the exploration of the functioning and evolution of microbial genomes.

Genome annotation and analysis. We are developing the CAAT-Box software (“Contig-Assembly and Annotation Tool-Box”), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase before the completion of the full genome sequence (finishing stage). Several collaborations with groups involved in large-scale sequencing projects are maintained through this software, such as the Génolevures 2 consortium for the annotation and analysis of yeast genomes, and the laboratories working on the genome of the cyanobacterium Microcystis aeruginosa. We are also involved in the annotation of several bacterial genomes, like the saprophytic bacterium Leptospira biflexaand an Helicobacter pyloristrain involved in MALT lymphoma, and we are contributing to the re-annotation of well-studied bacterial species (e.g. we are participating to a project aimed at the complete re-sequencing, re-annotation and metabolic reconstruction of the genome sequence of Bacillus subtilis).

Integrated specialized databases. We are developing two main database software. GenoList is an integrated environment dedicated to the query and analysis of genomic data from bacterial species. The current version integrates genome data from more than 100 species. A user interface was developed for querying and navigating the data, including sequence analysis and subtractive genome analysis tools. This allows the user to browse the data in a powerful and intuitive way. A new instance of the database dedicated to Candidaspecies was set up recently. GenoScript is an integrated environment for transcriptome analysis, designed for collecting, storing, querying and analyzing DNA chips data. GenoScript allows the user to enter experiments and associated results, as well as to make requests and perform statistical analysis on the data. The interface of the database can be easily customized, and querying the data can be performed through an interface making it possible to build complex multicriteria queries.

Evolutionary genomics. We combine evolutionary and sequence analysis to provide reliable and sensitive tools to explore genome complexity. We automated tests comparing tree topologies to detect genes acquired by transfer, and evaluated this new tool on sequences obtained by simulations. We are now working on real data from complete bacterial genomes. We built an automated procedure – “PhyloClust” – to identify lineages of Serine Proteases involved in the immunity of vector mosquitoes. We are now precisely identifying proteins involved in the response of Aedes aegypti to Chikungunya virus in collaboration with biologists. We also used tree building methods to represent microbial relationships of a new division of uncultivable bacteria, discovered in an anaerobic sludge digester by a metagenomic approach. Lastly, we studied the usefulness of genetic markers and dynamic evolutionary tools to follow pathogen and vector populations in collaboration with several units of the Institut Pasteur and the international network.

Keywords: Genomics / Annotation / Integrated database / Transcriptome / Evolutionary genomics


Data and functionality integration in the GenoList environment


