In recent years, the scientific community has realized that different bacterial strains can play very different roles in an environment. The identification of these strains and their genomes is important for both basic research and human health. Scientists from the Institut Pasteur have developed a method to identify the genomes of different strains in the same environment.
For any given type of bacteria, we can observe some strains that are harmless or even beneficial, and others that are harmful. For example, Escherichia coli is a commensal bacterium – in other words naturally present in our bodies and generally harmless – that can be found in the human gut flora. But some of its strains can be pathogenic or carcinogenic for humans. It is therefore vital to be able to identify pathogenic and non-pathogenic strains.
Bacterial strain separation – a difficult process
When a genome is read by scientists, they usually obtain a series of genome fragments that need to be put back in the right order via a process known as reconstruction. With the latest technological progress in DNA sequencing and genome assembly algorithms, we can now reconstruct the whole genomes of several bacterial populations in an environment. This series of genomes is known as a metagenome. Among these genomes of bacterial populations, it is extremely difficult to distinguish between different strains of the same species using existing algorithms. These different strains often have very similar sequences, and the differences between them can be confused with sequencing "background noise." Current methods only allow us to reconstruct bacterial genomes that remove the differences between strains. These are known as "consensus" genomes.
Strainberry, a new method to reconstruct the genomes of different strains
Scientists from the Institut Pasteur have developed Strainberry, a new method based on the latest sequencing technologies and advanced algorithms. This method can be used to identify and correctly reconstruct the genomes of different strains in a metagenome.
"Strainberry combines species-level reconstruction (composed of consensus sequences) with the effective reuse of algorithms developed recently for human genomics. The effectiveness of Strainberry has been proven with artificial metagenomes, for which the results are already known. Strainberry was also tested on real metagenome data, where it reconstructed 20-118% of additional genomic material (depending on the sample and preliminary reconstruction) compared with existing approaches," explains Riccardo Vicedomini, the post-doctoral fellow behind this method in the Sequence Bioinformatics group led by Rayan Chikhi at the Institut Pasteur.
The method revealed the presence of several strains that had not previously been characterized. This research is a first step towards an increasingly precise characterization of metagenomes – some of which can be highly complex and contain thousands of different strains –, which will require the development of new algorithms.
This research was carried out in collaboration with Christopher Quince from the Earlham Institute and Aaron Darling from the University of Technology Sydney.
Automated strain separation in low-complexity metagenomes using long reads, Nature communication, July 23, 2021
R. Vicedomini1, C. Quince2,3,4, A. E. Darling5, R. Chikhi1
1Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
2 Organisms and Ecosystems, Earlham Institute, Norwich, United Kingdom
3 Gut Microbes and Health, Quadram Institute, Norwich, United Kingdom
4 Warwick Medical School, University of Warwick, Coventry, United Kingdom
5 The iThree Institute, University of Technology Sydney, Ultimo, Australia