The BioInfo Projects


GMP-Tool-Box

Find Target

Links for assembly softwares

Utilities


GMP-Tool-Box

Author : L. Frangeul

GMP-Tool-box (GMPTB) is a software package developed for the computational part of a genome project. The modules of GMPTB are designed for the Listeria Monoytogenes genome project but are also applicable to other genomes. The modules are classed into 3 groups corresponding to 3 steps of a genome project : shotgun follow-up, closure phase and annotation. Follow-up: During the shotgun phase, GMPTB extracts from the result file (Phrap format) all characteristics of the assembly (nber of contigs, nber of sequences ...) and displays them in a table. This table can be used by other programs to create graphs of the progress of the shotgun phase. Moreover, GMPTB compares each assembly result with the former assembly and creates an HTML page to explain the relationship between old and new contigs (fusion, creation ...). Closure phase: GMPTB contains tools to predict links between contigs. GMPTB searches for all the inserts with its ends in two contigs and marks this insert as linking-insert or as a misassembly according to orientations of each end and the distance between the 2 ends. These results can be obtained simultaneously with the different categories of libraries used in the shotgun phase (small, medium, BAC ...). GMPTB can also predict links by searching similarities between ends of the contigs and other sequences. Annotation: GMPTB allows to start annotation during the finishing phase. Actually, GMPTB creates an Individual Protein File (IPF) for each ORF of an assembly. This IPF is a text file with a specific format which contain 3 categories of fields :
- The "minimum fields" contain indentification number, version number, location and sequences. The nucleotide sequence exported correspond to the sequence of the ORF with 500 additional bases before the first stop condon and 200 additional bases after the second stop.
- The "automatic fields" contain results added by different programs to the IPF. These results can concern the ORF itself (homology, domains ...) but also for instance the research of RBS, promotors or terminators (before and after the ORF).
- The "manual fields" contain the results and comments added by the users.
From an assembly to another, GMPTB extracts all ORFs from the new assembly and creates new IPFs according to the old IPF sequences. GMPTB recognises the modified IPFs which are the only ones used for a new automatic analysis after each assembly. Using this stategy, the user works only with a group of IPFs independently of the closure phase progression. All the IPFs are also accessible by a web server and therefore can be modified and commented by different groups during the genome project.

For more informations see the slide show

A first release will be available on this web page, soon ...


Top




Find Target

Farid Chetouani*, Philippe Glaser, Frank Kunst : Microbiology 2001, October 2001; 147(10):2643-2649.

*Corresponding author : F. Chetouani

The increasing number of complete bacterial genomes available in the public databases offers new perspectives for linking phenotype and genotype. According to a phenotypic property, genomes can be clustered into two groups. This could be for instance the capacity to grow or not on a medium with an antibiotic added, or the capacity to synthesize or not the outer membrane (Gram-/Gram+). If we assume that the genes responsible for a phenotype are conserved during the evolution, it is possible to propose candidate genes participating in the phenotype by sequence homology search : they are present in a set of genomes and not in another one. In this context, we propose a new tool for subtractive genome analysis with a web interface : FindTarget is a tool for identifying genes potentially specific for one or several species. During a session, the user defines via a html form all search parameters : (1) the list of genomes supposed to share genes according to a user homology criterion/threshold, (2) the list of genomes where the same genes are absent according to a user criterion/threshold. All the homology criteria (e.g. : %identity) are extracted or computed from blastp comparisons of proteome versus proteome. There are other programs available like SEEBUGS1 for differential genome analysis. However FindTarget differs from SEEBUGS principally in two aspects : (1) FindTarget is based on Blast search results while SEEBUGS is based on the Fasta program, (2) FindTarget and SEEBUGS integrate different similarity criteria.
1. Bruccoleri, R.E., Dougherty, T.J. and Davison, D.B. (1998) Concordance analysis of microbial genomes. Nucleic Acids Res, 26, 4482-6.

Go to FindTarget Home Page

Top

Unité GMP, Institut Pasteur, Départem des Génomes et Génétiques, 28 rue du Dr Roux, 75724 Paris Cedex 15, France. Fax :++33 1 45 68 87 86