Microbial evolutionary genomics

Software development

The development of software is not a priority of our group. Yet, along the years we have developed a set of programs of which some attained a publishable status. These are described below. If you wish to know about availability of other methods that we use in our publications contact us directly.

<a href="resource:/easysite/page/064-000002-46d">Repeatoire</a><br />

Click the above link or here to access the Repeatoire page.


Historical. Growthpred is a predictor of minimal doubling times and an analyzer of codon usage bias in genomes and metagenomes developed by Sara Silva and using some of Eduardo’s programs. The program/server is still unpublished, the reference is a more research-oriented paper in Plos Genetics.
Reference. Vieira-Silva S, Rocha EPC (2009) The Systemic Imprint of Growth Rates and its Uses in Ecological (Meta)genomics. Plos Genet in press
Availability. http://mobyle.pasteur.fr/cgi-bin/portal.py?form=growthpred


Historical. Repseek took a long time to go to print because it evolved along papers done in collaboration by Eduardo Rocha, Alain Viari, Antoine Danchin, Eric Coissac, Pierre Netter and Guillaume Achaz. Finally, Guillaume did most of the work at the end of our research. One strong point of Repseek are that it searches for repeats allowing for indels and mismatches doing a blast-like alignment from seeds. The other strong point is that it is as far as we know the only program implementing appropriate statistics at both levels, either for seeds using the Karlin & Ost formulas or in a Blast-like manner for degenerate repeats.
Abstract. Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels.
Reference. Achaz, Boyer, Rocha, Viari and Coissac  (2006) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bionformatics. 23:119-21
Availability. http://wwwabi.snv.jussieu.fr/public/RepSeek/


Historical. Swelfe was developed mostly by Anne-Laure Abraham with help from Joël Pothier. The goal was to develop a tool that searched intragenic repeats in DNA sequences, protein sequences and protein structures.
Abstract.Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results. 
Reference. Abraham, Rocha and Pothier (2008) Swelfe : a detector of internal repeats in sequences and structures. Bioinformatics 24:1536-7.
Availability. http://bioserv.rpbs.univ-paris-diderot.fr/cgi-bin/swelfe