Evolutionary Microbial Genomics / Research / Population genomics

Microbial evolutionary genomics

Selection and mutation in bacteria

Evolution is an intrinsically stochastic process that involves a time lag between the generation of the polymorphisms and their moderation by selection. During the very initial stages of sequence diversification only those few changes resulting in the gravest selective consequence (i.e. cell death) have very high probability of being removed (Figure 1). Yet, most changes will ultimately be lost and only a small minority will achieve fixation. The ratio of non-synonymous (dN) to synonymous (dS) changes between taxa is frequently computed to assay the strength and direction of selection. We noted that for comparisons between closely related strains and/or species a second parameter needs to be considered, namely the time since divergence of the two sequences under scrutiny. We demonstrated that a simple time lag model provides a general, parsimonious explanation of the extensive variation in the dN / dS ratio seen when comparing closely related bacterial genomes. There are taxon-specific differences in the change of dN / dS over time, which may indicate variation in selection, or in population genetics parameters such as population size or the rate of recombination. Because of the critical effect of time since divergence of the lineages, inter-taxa comparisons among asexuals are only meaningful when compared with the expected trajectories of dN / dS over time. These results provide a common explanatory framework for the data showing that among very closely related bacterial genomes the rate of non-synonymous changes is close to the one of synonymous (Figure 1).

Analyzing a trait in genomes at very different divergence times allows decoupling selection from mutation. We have applied this method to understand why some amino acids were reported to be increasing in frequency in genomes, whereas others were decreasing. We have shown that such bias is resulting simply from the time lag between generation of polymorphisms and their removal by purifying selection. There may be several traits under selection in this process, but we were able to show that costly amino acids were systematically purged in such a way that their equilibrium frequency is lower than that expected based on purely mutational processes. This further shows the importance and power of accounting for population genetics theory when comparing closely related genomes.

 
Figure 1– Left. The ratio of the instantaneous rates of neutral and deleterious changes (R0) is distorted by natural selection in that lethal changes are immediately eliminated and other deleterious changes are progressively purged from populations. Differences between closely related genomes will mostly reflect the mutational patterns, differences between very distant genomes will show the imprint of selection. Right. The dS/dN ratio between two taxa as a function of the divergence in their intergenic DNA (approximately a measure of evolutionary distance). Full circles: Bacillus data, open circles: Staphylococcus data. Distance between intergenic sequences was computed as the percentage of sequence identity (from Rocha, JTB, 09 and Rocha, Curr Op Microbiology, 08).

We examined the molecular evolutionary consequences of the emergence of Shigella within E. coli niche specialisation by comparing the normalised, directional frequency profiles of unique polymorphisms within 2098 orthologues representing the intersection of five E. coli and four Shigella genomes. We note a surfeit of AT enriching changes (GC->AT), transversions and non-synonymous changes in the Shigella genomes. By examining these differences within a temporal framework, we conclude that our results are consistent with relaxed or inefficient selection in Shigella owing  to a reduced effective population size. Alternative interpretations, and the interesting exception of S. sonnei, are discussed. Finally, this analysis lends support to the view that nucleotide composition typically does not lie at mutational equilibrium, but that selection plays a role in maintaining a higher GC content than would result solely from mutation bias.  This argument sheds light on the enrichment of adenine and thymine in the genomes of bacterial endosymbionts where purifying selection is very weak.


Figure 2- Regression plot showing that dN/dS and +AT/+GC at four-fold degenerate sites closely fit to a power law (adj R2 = 0.83; P>0.001). Direct causality is not obvious, but this plot shows how both parameters covary with divergence time and suggests the imprint of selection on genome composition (from Balbi, Molecular Biology Evolution, 09).

The evolutionary role of natural transformation

Natural transformation is the process of sexual exchange between prokaryotes that most resembles eukaryotic sex and its evolutionary role has been extensively debated. Following the discovery of the large frequency of horizontal gene transfer in bacterial genomes, most recent works have implicitly assumed that transformation’s role is to allow acquisition of new functionalities. Transformation in Neisseria requires the presence of a specific 10-12 nt DNA uptake sequence (DUS) in the incoming DNA. Neisserial sex is an active process mediated by specific machineries capable of importing and exporting genetic information and competence for transformation is constitutive throughout its growth-cycle. Neisserial genomes have high densities of repeated elements and have a nearly panmitic population structure. Intra-chromosomal recombination is a major source of variability in Neisseria, and it may also lead to re-occurring states of hypermutability. Hence neisserial genomes change very quickly by horizontal transfer, non-homologous and homologous intra-chromosomal recombination and by transient hypermutability. We thus assessed the functional and evolutionary roles that natural transformation could play in this system from a population genomics perspective.

If DUS are markers of selection for transformation, as commonly thought, their differential presence and conservation across a genome may also answer questions about the advantages of natural transformation. Indeed, we found that the distance between contiguous DUS in genomes matches precisely the size of conversion fragments arising from recombination with DNA from other strains. We have thus aligned 6 neisserial genomes and inferred the sets of genes that were recently acquired and lost. The DUS-abundant regions correspond to the core genome and were more conserved than other genomic domains, implying that transformation mediates conservation rather variation of DNA. DUS were practically absent in both recently acquired genes and ancient genes recently lost from the genome. They were also rare in contingency loci and in genes encoding variable surface components. Thus, DUS were associated with regions under strong purifying selection and nearly absent in regions under selective pressure driving diversification.

Multiple genome alignment and gene conversion analysis of these genomes revealed that the spacing of DUS matched the average size of conversion fragments, suggesting that new DUS arise by recombination. This was also evidenced by the stringent conservation of DUS elements and the few degenerate DUS that we detected, demonstrating that DUS are in mutation-selection balance. As many other human pathogens and commensals, Neisseria generate genetic variability processes that increase locally or globally the rates of mutation. These events are expected to lead to frequent deleterious changes. Our results indicate that the evolutionary role of transformation in Neisseria is to counteract these mutation events and thereby Muller’s ratchet. Thus, rather than promoting variation, neisserial sex by transformation might be conservative.

Studying the dynamics of gene repertoires

Bacterial genomes suffer frequent reductions and expansions concomitant with changes in lifestyle. The most thoroughly studied are the ones involving genome reduction occurring among obligate symbionts, either mutualists or pathogens. I have been working on this subject in the last years to understand how such changes occur in time and how they affect cellular biology. The focus of my work has been in trying to calibrate the rates of gene loss in Mycobacterium leprae and to understand which genes are more prone to loss in enterobacteria.

We reconstructed the gene content and order of the last common ancestor of human pathogens Mycobacterium leprae and Mycobacterium tuberculosis. During the reductive evolution of M. leprae 1537 out of 2977 ancestral genes were lost among which we found 177 previously unnoticed pseudogenes. We propose that M. leprae lineage regularly lost small numbers of genes along the last 100 million years, of which a large proportion of their DNA has disappeared from the genome. Yet, we find evidence of a massive gene extinction event that took place very recently in the lineage during a short period of time, leading to the loss of hundreds of ancestral genes (Figure 2). A large proportion of their sequence (around 89%) still remains in the genome, which allowed us to characterize gene loss and date the events of pseudogenization. The age of the pseudogenes was computed using a new methodology based on the rates and patterns of substitutions in the pseudogene and the functional orthologs in closely related genomes. Basically, we take advantage from the information of synonymous and non-synonymous substitution rates and from the fact that they will be very different until the peudogenization event and similar onwards, because purifying selection on protein sequence disappears (Figure 2). The position of the genes that were lost in the ancestor’s genome revealed that the process of loss of function and degradation took mainly place through a gene-to-gene inactivation process followed by the gradual loss of their DNA. This suggests a scenario of genome reduction through many small pseudogenization events leading to a highly specialized pathogen and is coherent with my previous proposal that high densities of repeats may lead to frequent pseudogenization by deletions caused by repeat-mediated illegitimate recombination.

  
Figure 3 – Left. Synonymous and non-synonymous rates differ and once sufficient time has elapsed to establish a steady-state between the rate of creation and loss of slightly deleterious changes the two dynamics are roughly linear. Pseudogenization leads previously non-synonymous positions to evolve neutrally. If synonymous positions evolve nearly neutrally then one can infer the age of the pseudogenization events by comparing expected and observed trajectories of divergence through time (from Rocha, Curr Opinion Microbiology 2008). Right. Frequency distribution of the age of M. leprae pseudogenes (from Gomez-Valero, Genome Res 2007).

The understanding of genome size changes can only be understood at the light of cell functioning. For this, one must understand how cellular networks evolve at times of genome reduction. Unlike eukaryotes, which often recruit duplicated genes into existing networks, the low levels of gene duplication coupled with the high probability of lateral transfer of novel genes, alters the manner in which protein-protein interaction (PPI) networks can evolve in bacteria. By inferring the PPIs present in the ancestor to all contemporary Gammaproteobacteria, we were able to trace the changes in gene repertoires, and their consequences on PPI network evolution, in several bacterial lineages that have independently undergone changes in genome size. As genomes degrade, virtually all multi-partner proteins have lost interactors; however, the overall average number of connections increases due to the preferential elimination of proteins that interact with only one other protein partner. We also studied the effect of lateral gene transfer on PPI network evolution by analyzing the connectivity of proteins that have been gained along the lineage of E. coli, as well as those acquired that were subsequently silenced in Shigella flexneri. The situation in PPI networks, in which newly acquired genes preferentially attach to the hubs of the network, contrasts that observed in metabolic networks, which evolve by the peripheral gain and loss of genes, and in regulatory networks, in which high connectivity increases the propensity of loss. This shows that in spite of recent claims, there is more diversity to biological networks than meets the eye when evolutionary processes are accounted for.

 
Figure 4- Characteristics of Gammaproteobacterial genomes and of the reconstructed ancestor of these species (from Ochman, JEZ, 07).

Bacterial cooperation and the secretome

Microbes engage in a remarkable array of cooperative behaviours, secreting shared molecules that are essential for foraging, shelter, communication, microbial warfare and more. If the production of these shared proteins or metabolites involves a cost, then a population of cooperators will be vulnerable to exploitation by non-producing cheats. We have investigated how cooperation can arise and persist in microbial populations subject to migration and horizontal gene transfer. Our model predicts that differential gene mobility will drive intra-genomic variation in investment in social traits. More mobile loci will generate stronger among-individual genetic correlations at these loci (higher relatedness), and therefore allow the maintenance of more cooperative traits via kin selection. However, the presence of distinct relatedness coefficients for different loci within a genome will generate intra-genomic conflict over the extent an individual should behave cooperatively to its neighbours.

We then verify several predictions of the model. Our principal result is that the frequency of genes coding for secreted proteins - the secretome - increases with gene mobility, supporting our prediction that gene mobility drives bacterial cooperation. Intra-genomic conflict over cooperation (aid to neighbours will benefit the most infectious genes more often than the non-mobile genes) is then revealed by the co-localisation of secretome genes with addictive systems such as toxin-antitoxin and restriction-modification systems, which will act to enforce cooperation on the chimeric bacterial individual. The degree of intragenomic conflict and the extent of any social dilemma in general will be strictly dependent on the direct cost of producing the secreted gene product. We find that the biosynthetic cost of secretome genes is under intense selective pressure for cost reduction – secreted proteins are even biosynthetically cheaper than highly expressed proteins.

Finally, we demonstrate that mobile elements are in conflict with their chromosomal hosts over the social strategy of the chimeric ensemble bacterium, with mobile elements enforcing cooperation on their otherwise selfish hosts.


Figure 5 – Functions in the neighborhood of genes coding for localized proteins. Observed/expected co-occurrence of genes that are not in the core genome coding for proteins localized in different cellular regions with integrases, restriction/modification systems and toxins/antitoxins systems. The distributions are significantly different from the expected values for all three types of genes (χ2 test, p<0.0001) (from Nogueira, Curr Biology, 09).

Relevant references from the lab:

Balbi, KJ, EP Rocha, EJ Feil. 2009. The temporal dynamics of slightly deleterious mutations in Escherichia coli and Shigella spp. Mol Biol Evol 26:345-355.
Gomez-Valero, L, EPC Rocha, A Latorre, FJ Silva. 2007. Reconstructing the ancestor of Mycobacterium leprae: the dynamics of gene loss and genome reduction. Genome Res 17:1178-1185.
Hurst, LD, EJ Feil, EP Rocha. 2006. Protein evolution: causes of trends in amino-acid gain and loss. Nature 442:E11-12.
Nogueira, T, DJ Rankin, M Touchon, F Taddei, SP Brown, EP Rocha. 2009. Horizontal Gene Transfer of the Secretome Drives the Evolution of Bacterial Cooperation and Virulence. Curr Biol 19:1683-1691.
Ochman, H, R Liu, EP Rocha. 2007. Erosion of interaction networks in reduced and degraded genomes. J Exp Zoolog B Mol Dev Evol 308:97-103.
Rocha, EP. 2008. Evolutionary patterns in prokaryotic genomes. Curr Opin Microbiol 11:454-460.
Rocha, EPC, J Maynard Smith, LD Hurst, MT Holden, JE Cooper, NH Smith, E Feil. 2006. Comparisons of dN/dS are time-dependent for closely related bacterial genomes. J Theor Biol 239:226-235.
Treangen, TJ, OH Ambur, T Tonjum, EP Rocha. 2008. The impact of the neisserial DNA uptake sequences on genome evolution and stability. Genome Biol 9:R60.