Evolutionary Microbial Genomics - CNRS URA 2171  

  HEADEduardo PIMENTEL CACHAPUZ ROCHA / eduardo.rocha@pasteur.fr

  Annual Report

Our work has been focused on answering the following questions: 1) How and why are bacterial genomes organised. 2) How dynamic are genomes? 3) What results from the trade-off between organisation and dynamics? We found that less than 2000 genes in E. coli are present in all of the first 20 sequenced genomes. E. coli K12 and S. enterica typhimurium have a divergence time estimated at around one million years, i.e. 108 generations and a rearrangement rate higher than 10-4/generation. Yet, the relative order of the orthologous genes is practically identical. How can such high genome dynamics result in an organised genome? We investigated where such insertions took place using the ancestral order. While 51% of the locations showed no single insertion or deletion in any of the 21 genomes, we found 133 locations with an average of more than 5 non-core protein coding genes per genome. These locations accumulate 71% of all non-core pan-genome. This analysis revealed that in most genomes gene acquisition and loss takes place at precisely the same locations, i.e. between the same two contiguous genes of the core genome (Figure 1). Therefore, hotspots correspond to regions of abundant and parallel insertions and deletions of genetic material. While the existence of large insertions and deletions in E. coli has been abundantly described, our data shows that these events take systematically place at the same regions in different genomes.

What leads to such hotspots? Most such locations lack tRNAs and phage integrases. This seriously challenges the widely held view that E. coli integration hotspots are mostly determined by the bias of phage-like integrases to insert at tRNAs. What else could create such hotspots? Selection for the integrity of genetic elements, e.g. genes, and for genome organisation, e.g. operons, reduces the number of locations where large insertions can occur without significant loss of fitness. Once a permissive region acquires a large element, and since most transferred DNA has no adaptive value, subsequent integration in the region becomes more likely because the region offers a larger target for neutral insertion. The insertion of a large element in a permissive region will then result in a founder effect that amplifies the likeliness of a permissive region to become a hotspot. The existence of hotspots concentrating locally most changes explains why whole-genome organisation is compatible with the extremely high dynamics of horizontal gene transfer in the species.

Keywords: Molecular evolution, bioinformatics, bacterial genomics


Figure 1 - Number of genes (ranging from 0 to 200) in indels along modern strains considering the ancestral gene order of the core genome. The numbers in the x-axis represent the order of genes in the core genome, which has the same order as E. coli K-12 MG1655.


  1. Rocha EPC (2008) The organization of the bacterial genome. Annu Rev Genetics 42:211-33.

  2. Treangen T, Ambur OH, Tonjum T, Rocha EPC (2008) The impact of the neisserial DNA uptake sequences on genome evolution and stability. Genome Biol 9:R60.

  3. Touchon M, Rocha EPC (2007) Causes of Insertion Sequences abundance in prokaryotic genomes. Mol Biol Evol 24:969–981.

  4. Rocha EPC, Touchon M, Feil E (2006) Similar compositional biases are caused by very different mutational effects. Genome Res 16:1537-47.

  5. Hurst LD, Feil E, Rocha EPC (2006) Causes of trends of amino acid gain and loss. Nature 442:e11-2.

  Web Site

More informations on our web site

Activity Reports 2009 - Institut Pasteur
If you have problems with this Web page, please write to rescom@pasteur.fr