| PDF Version | Genomic analysis |
| Director : Ivan MOSZER (moszer@pasteur.fr) |
|
The activities of the Technical Platform 4 "Annotation" are many: assembly and initial annotation of new genome sequences, curation of annotated data collections, software development of genomic databases, genome analysis and molecular phylogeny. A large fraction of this work is performed in the framework of collaborations with other groups of the Institut Pasteur: research units, Génopole, computing center, and teaching department. |
|
Assembly and annotation of new genome sequences (L. Frangeul, B. Giletti, S. Mativet) We are developing the CAAT-Box software ("Contig-Assembly and Annotation Tool-Box"), which contains a set of methods for the follow-up of the assembly phases of a genome sequencing project, and for initiating the annotation phase starting from the finishing step (to get further details about CAAT-Box, please see prior activity reports). Several collaborations with groups involved in large-scale sequencing projects are maintained through this software:
Annotated genomic databases (C. Boursaux-Eude, J.-C. Camus, L. Hummel, A. Marcel, M. Pryor) Two additional bacterial genomes (Streptococcus pneumoniae R6 and Tigr4) were integrated in a GenoList-like database, named StreptoPneumoList. Likewise, in the framework of a Transversal Research Program with the Institut Pasteur of Lille (C. Locht), the genomes of three organisms of the Bordetella group (B. pertussis, B. parapertussis, B. bronchiseptica) will be imported soon in new GenoList databases. In order to facilitate the integration of bibliographical references in the GenoList databases, a new tool was developed (BiblioDB). It allows the user to import all the references related to a given organism (e.g. through an EndNote file), and to link each reference to the relevant genes, through an adapted interface. These relationships are then integrated to the corresponding GenoList database. This tool was used in the context of the reannotation of mycobacterial genomes (M. tuberculosis, M. bovis, M. leprae), in collaboration with the Unité de Génétique Moléculaire Bactérienne (S. Cole). Concerning M. tuberculosis, over 80 new CDS were predicted, notably through the use of the AMIGA program, and all the genes were classified according to a new functional classification. Many gene functions were revised, according to in-depth analysis of sequence comparison data and careful survey of the recent literature (over 1,000 references). Proteomic data were also integrated into the TubercuList database. GenoList: towards a new multi-genome version (S. Moreira, L. Hummel, A. Marcel, E. Quevillon) GenoList databases and Web servers are up and running for over ten years and they are internationally recognized. They combine a user-friendly presentation of the data, an intuitive browsing model, and query or analysis tools tightly integrated in the application and closely linked to the data. Each server however cannot handle more than a single organism, except for closely related strains. We are in the process of developing a multi-genome version of GenoList. This required first to enhance the conceptual data model, especially by adding relationships between organisms. The user interface is being completely rebuilt as well, and it will integrate new multi-genome comparison tools (coll. A. Le Roch, Unité des Cyanobactéries). On the technical side, we have decided to use a new development environment, WebObjects, well suited to the development of three-tier applications, and allowing us to quickly and easily maintain large Web applications. SubScript: a transcriptome database (S. Moreira) We built a database for the storage and the analysis of transcriptome data, named SubScript. It allows the user to submit new transcriptome experiments obtained with micro- or macro-arrays. The data model was designed with the precise type of information needed in mind (MGED-compliant), and the submission interface was developed in a user-friendly way. This work was performed through a strong interaction with laboratories performing the practical experiments, especially the Génopole TP2 (J.-Y. Coppée, G. Lacourrège). Statistical analysis tools are being developed now (coll. L. Marisa, Unité de Génétique des Génomes Bactériens), by taking into account the intrinsic complexity of the data and the experiment schema. The application is developed using the WebObjects Web application server. This project was initiated in the framework of the " BACELL Network " program for the functional analysis of Bacillus subtilis, in collaboration with the Unité de Génétique des Génomes Bactériens (A. Danchin). New collaborations have been set up with IP groups participating to other functional analysis projects:
These collaborations will allow us to increase the data available in our database, thus opening new opportunities for the cross-comparison of metabolic regulations from closely related organisms. Phylogeny (C. Dauga) Strategies to highlight gene evolution processes were determined in order to choose the best approaches to extract phylogenetic information from gene sequences. This requires in-depth understanding of concepts used in phylogeny, and the knowledge of the restrictions of phylogenetic tree construction models. These strategies are intended to alleviate biological problems encountered by many research groups in molecular identification studies or epidemiology of organisms:
These strategies are also required to understand the evolutive history of genes, lateral transfers, genetic recombinations and duplications in genome exploration:
A new project for the definition of phylogenetic strategies to detect lateral gene transfers between neighbouring species is now being developed. Teaching activities (C. Boursaux-Eude, C. Dauga, L. Frangeul) Teaching is one of the missions of PT4:
Keywords: annotation, databases, genome, transcriptome, phylogeny |
| More informations on our web site |
| Publications of the unit on Pasteur's references database |
| Office staff | Researchers | Scientific trainees | Other personnel | |
| LUCHIER, Françoise,fluchier@pasteur.fr (partial time) | DAUGA, Catherine, IP, Research Scientist,cdauga@pasteur.fr MOSZER, Ivan, IP, Research Scientist,moszer@pasteur.fr |
PRYOR, Melinda, Post-doc fellow,mpryor@pasteur.fr | BOURSAUX-EUDE, Caroline, Engineer,cbx@pasteur.fr DEHOUX, Pierre, Engineer,pdehoux@pasteur.fr FRANGEUL, Lionel, Engineer,lfrangeu@pasteur.fr HUMMEL, Laurence, Engineer,lhummel@pasteur.fr (fixed-term contract) MARCEL, Anne, Engineer,amarcel@pasteur.fr (fixed-term contract) MOREIRA, Sandrine, Engineer,moreira@pasteur.fr |