Microbes are the most common and diverse organisms on the planet. In recent years, researchers have accumulated data about them, particularly with respect to their genetic background. This avalanche of information - known as "Big Data" - is a fantastic source of new knowledge and medical progress. Researchers at the Institut Pasteur and EMBL’s European Bioinformatics Institute have combined their knowledge of bacterial genetics and web search to build a DNA search engine for microbial data. The search engine, described in a paper published in Nature Biotechnology, could enable researchers and public health agencies to use genome sequencing data to monitor the spread of antibiotic resistance genes. By making this vast amount of data discoverable, the search engine could also allow researchers to learn more about bacteria and viruses.
The search engine, called Bitsliced Genomic Signature Index (BIGSI), fulfils a similar purpose to internet search engines, such as Google.
The amount of sequenced microbial DNA is doubling every two years. Until now, there was no practical way to search this data even using computers, let alone manually.
“This type of search could prove extremely useful for understanding disease. Take for example, an outbreak of multi-drug resistant bacteria in an hospital, where the cause is a Klebsiella strain containing a drug-resistance plasmid (a mobile DNA element that can spread drug resistance across different bacterial species). For the first time, BIGSI allows researchers to easily spot if and when the plasmid has been seen before by querying all available data. ” explains Eduardo Rocha, head of the Microbial evolutionary genomics Laboratory at the Institut Pasteur. “By combining Pasteur’s expertise in mobile DNA elements and EBI’s BIGSI engine to scan all available data, we identified plasmids that have recently spread between very distantly related bacteria. These are the mobile elements capable of transferring antibiotic resistance genes from their natural reservoirs up to the genomes of human pathogens.”
Google and other search engines use natural language to search through billions of websites. They are able to take advantage of the fact that human language is relatively unchanging. By contrast, microbial DNA shows the imprint of billions of years of evolution, and so each new microbial genome can contain new “language” that has never been seen before. The key to making BIGSI work was finding a way to build a search index that could cope with the diversity of microbial DNA.
Monitoring infectious diseases
“We were motivated by the problem of managing infectious diseases and antibiotic resistance,” explains Zamin Iqbal, Research Group Leader at EMBL-EBI. It is known that bacteria can become resistant to antibiotics either through mutations or with the help of plasmids. Also, mutations can be used in bacterial DNA as a historical record of bacterial ancestry. This allows to infer, to some extent, how bacteria might spread across a hospital ward, a country or the world. BIGSI helps us study all of these things at massive scale. For the first time, it allows scientists to ask questions such as ’has this outbreak strain been seen before?’ or ’has this drug resistance gene spread to a new species?’.
A quick and easy search
“This search engine complements other existing tools and offers a solution that can scale to the vast amounts of data we’re now generating” explains Phelim Bradley, Bioinformatician at EMBL-EBI. It means that the search will continue to work as the amount of data keeps growing. In fact, this was one of the biggest challenges researchers had to overcome. They were able to develop a search engine that can be used by anybody with an internet connection.
“As DNA sequencing becomes cheaper, we will see a whole new host of users outside of basic research, and a rapid increase in the volume of data generated,” continues Iqbal. It is very likely that DNA sequencing will be used in clinics, or in the field, to diagnose patients and prescribe treatment, but it could also be used for a range of other things, such as checking what type of meat is in a burger. Making genomics data searchable at this point is essential and it will allow scientists to learn a huge amount about biology, evolution, the spread of disease, and much more.
Why focusing on microbes?
A microbe is a living thing that is too small to be seen with the naked eye and requires a microscope. “Microbe” is a general term used to describe different type of life forms, including bacteria, viruses, fungi, and more.
Bacteria have been around for billions years and live in nearly every habitat on the face of the Earth, including our intestines, the deepest parts of the ocean and as high as the upper atmosphere. Bacteria reproduce extremely quickly, and the species can be very different from each other.
By comparing the DNA of multiple bacterial species, we can start to understand how they are related and study the dynamics of antibiotic resistance as it spreads across the world, and the tree of life. For example, DNA analysis can help us predict how dangerous a certain strain of tuberculosis is and what kind of drugs that particular strain would or would not respond to.
Real-time search of all bacterial and viral genomic data, Nature Biotechnology, February 4, 2019
Phelim Bradley1, Henk den Bakker2, Eduardo Rocha3,4, Gil McVean1, Zamin Iqbal5
1 University of Oxford
2 University of Georgia
3 Microbial evolutionary genomics Laboratory - Institut Pasteur
4 UMR3525 - CNRS.
5 European Molecular Biology Laboratory – European Bioinformatics Institute