| Genome Analysis and Integration |
| HEAD | Dr MOSZER Ivan / moszer@pasteur.fr | |
| MEMBERS | Dr BOURSAUX-EUDE Caroline / CHENNEN Kirsley / Dr DAUGA Catherine / Dr DEHOUX Pierre / Dr DESCORPS-DECLÈRE Stéphane / ÉTIENNE Annie / Dr FRANGEUL Lionel / HIRCHAUD Édouard / LECHAT Pierre / Dr MEURICE Guillaume / ROUSSEAU Sandrine / Dr SOUCHE Erika / TORRENT Jessica |
| Annual Report |
|
Introduction. The Platform 4 (PF4) is working on data integration and analysis of large-scale genomic information, by means of software developments and collaborative bioanalysis projects. These activities include: assembly, annotation, and re-annotation of genome sequences; engineering of genomic databases; evolutionary genomics; genome analysis, including next-generation sequencing (NGS). The working line we follow aims at the creation of an integrated environment for the exploration of the functioning and evolution of microbial genomes. Genome annotation. We are developing the CAAT-Box software (“Contig-Assembly and Annotation Tool-Box”), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase before the completion of the full genome sequence (finishing stage). Several collaborations with groups involved in large-scale sequencing projects were maintained through this software, such as the annotation and analysis project on Klebsiella rhinoscleromatis. We were also involved in the expert annotation of various bacterial genomes, like an Helicobacter pylori strain involved in MALT lymphoma; in addition, we participated in a project aimed at the re-sequencing, re-annotation and metabolic reconstruction of the genome sequence of Bacillus subtilis, and we have undertaken a project on the re-annotation of Clostridium difficile. Integrated specialized databases. We are developing two main database software. GenoList is an integrated environment dedicated to the comparative analysis of microbial genomes. The latest release integrates genome data for more than 700 species from the Genome Reviews repository (http://genolist.pasteur.fr/GenoList). The query and navigation user interface was recently supplemented with a dynamic module for viewing synteny organization of genomes, and with functionalities enabling a more intuitive and straightforward selection of organisms of interest. GenoScript is an integrated environment for transcriptome analysis, designed for collecting, storing, querying and analyzing DNA chips data (http://genoscript.pasteur.fr). GenoScript allows the user to enter experiments and associated results, and to make various requests on the data. The interface was completely revamped in version 3 of the application recently released, including data query capabilities through multicriteria requests and visualization tools like Cytoscape. Evolutionary genomics.We combine evolutionary and sequence analysis to provide reliable and sensitive tools to explore genome complexity. We developed comparative genomics approaches to identify proteins involved in host-pathogen interactions in Chlamydiaceae.We organized the repertoire of hypothetical proteins in 14 genomes according to their secondary structures, protein domains and evolutionary pressure, to propose new targets to cellular biologists. We built an automated computational method for the functional clustering of proteins on the basis of phylogenetic relationships – “PhyloClust”. This tool was implemented to subdivide large protein families in insect genomes and obtain a classification of Serine Proteases and related proteins for which we have no functional information. We also studied evolutionary dynamics of populations of mosquito vectors, and performed a biogeographic analysis of Taenia solium, the causative agent of cysticercosis in Madagascar (collaborations with the international network). Population genomics and NGS analysis.We are working on “next-generation sequencing” (NGS) data analysis, in the framework of genome re-sequencing and polymorphism identification projects. To this end, we have developed several programs to help in the filtering of millions of reads, according to their quality and other criteria, and to characterize possible Single Nucleotide Polymorphisms (SNPs) and other genomic features (e.g.genomic islands). We also have evaluated and used a number of third-party algorithms for assembling NGS reads, mapping reads on reference sequences, and calling SNPs, using both Illumina Genome Analyzer and Roche 454 GS-FLX sequence data. These developments were put into practice in the framework of several bacterial re-sequencing projects (including projects on Shigella, Klebsiellaand Yersiniaspecies), which involved the sequencing and comparison of a large number of strains. Keywords: Genomics / Annotation / Integrated database / Transcriptome / Evolutionary genomics / Next-Generation Sequencing | ||
|
| Publications |
|
Barbe V, Cruveiller S,
Kunst F, Lenoble P, Meurice G,
Sekowska A, Vallenet D, Wang T, Moszer I, Médigue C, Danchin A (2009).
From a consortium sequence to a unified sequence:
The Bacillus subtilis
168 reference genome a decade later.
Microbiology 155:1758-1775.
(19383706) |
| Web Site |
| More informations on our web site |
Activity Reports 2009 - Institut Pasteur
If you have problems with this Web page, please write to rescom@pasteur.fr