Microbial evolutionary genomics

Software development

The development of software is not a priority for our group. Yet, along the years we have developed a set of programs of which some attained a publishable status. These are described below. If you wish to know about availability of other methods that we use in our publications contact us directly.

MacSyFinder

Click here to access the MacSyFinder page.
Historical: MacSyFinder was developped by Sophie Abby in our lab, in tight interaction with the CIB of Institut Pasteur, and in particular Bertrand Néron and Hervé Menager. 
Abstract. Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context.
ReferenceAbby SS, Néron B, Ménager H, Touchon M, Rocha EPC (2014) MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 9: e110726.
Availability: https://github.com/gem-pasteur/macsyfind​er and https://github.com/gem-pasteur/macsyview.

Repeatoire

Click the above link or here to access the Repeatoire page.

Growthpred

Historical. Growthpred is a predictor of minimal doubling times and an analyzer of codon usage bias in genomes and metagenomes developed by Sara Silva and using some of Eduardo’s programs. The program/server is still unpublished, the reference is a more research-oriented paper in Plos Genetics.
Abstract.
Reference. Vieira-Silva S, Rocha EPC (2009) The Systemic Imprint of Growth Rates and its Uses in Ecological (Meta)genomics. PLoS Genetics 6: e1000808.
Availability. http://mobyle.pasteur.fr/cgi-bin/portal.py?form=growthpred

Repseek

Historical. Repseek took a long time to go to print because it evolved along papers done in collaboration by Eduardo Rocha, Alain Viari, Antoine Danchin, Eric Coissac, Pierre Netter and Guillaume Achaz. Finally, Guillaume did most of the work at the end of our research. One strong point of Repseek are that it searches for repeats allowing for indels and mismatches doing a blast-like alignment from seeds. The other strong point is that it is as far as we know the only program implementing appropriate statistics at both levels, either for seeds using the Karlin & Ost formulas or in a Blast-like manner for degenerate repeats.
Abstract. Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels.
Reference. Achaz, Boyer, Rocha, Viari and Coissac  (2006) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bionformatics. 23:119-21
Availability. http://wwwabi.snv.jussieu.fr/public/RepSeek/

Swelfe

Historical. Swelfe was developed mostly by Anne-Laure Abraham with help from Joël Pothier. The goal was to develop a tool that searched intragenic repeats in DNA sequences, protein sequences and protein structures.
Abstract.Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results. 
Reference. Abraham, Rocha and Pothier (2008) Swelfe : a detector of internal repeats in sequences and structures. Bioinformatics 24:1536-7.
Availability. http://bioserv.rpbs.univ-paris-diderot.fr/cgi-bin/swelfe