Like many other research teams at the Institut Pasteur, the Bioinformatics and Biostatistics Hub is engaged in fighting against the COVID-19 pandemic by participating in the curation of GISAID data (Global Initiative on Sharing All Influenza Data). The Hub is the service division of the Computational Biology Department, and is composed of 50 experts in biostatistics and bioinformatics.
At the end of March 2020, following a discussion on phylodynamic questions with the Evolutive Bioinformatics Unit at the Institut Pasteur, GISAID asked for help with processing the increasingly abundant data that was being submitted as well as with maintaining its quality. A solution for managing this essential resource was reached rapidly. The Hub agreed to accommodate this request, and as of April 1st, thirteen of its members have been actively curating, on a daily basis, data received by the consortium.
The GISAID initiative, launched in 2006 following the 2006 bird flu epidemic, fosters international sharing of sequences associated with this virus as well as related geographical, clinical and epidemiological information. Its scope is now being extended to species associated with avian and other animal viruses, today including SARS-CoV-2, to help the scientific community understand how viruses evolve, spread and potentially trigger pandemics. (To boot, the National Reference Center for Respiratory Viruses (Including Influenza) at the Institut Pasteur shared the two complete sequences of viruses taken from two of the first French cases on this platform on the 30th of January, 2020.) The Initiative guarantees that access to data in GISAID is free for everyone, provided that individuals log in and agree to respect the GISAID sharing mechanism governed by its database access agreement. As of the 15th of April 2020, more than 130 SARS-CoV-2 genomes had been submitted by Institut Pasteur teams since the month of January.
Concretely, Hub members are on duty every day, from midday to midnight, to process the numerous genomes of SARS-CoV-2 submitted (which range from a few dozen to several hundred per day) in order to validate the quality and reliability of sequences and their metadata. The objective is to standardize the metadata in order to facilitate searching the database, and to check the consistency of the assemblies. More than 9,000 genomes are accessible on the GISAID web site today (April 15, 2020), of which almost 3,000 have been curated since the 1st of April with the help of the Hub. This data is used, among other things, by nextstrain, an open source project aimed at providing a snapshot of the evolution of populations of pathogens via a modern and reactive interface.
In addition to this action, the Hub remains available to campus scientists. More than ever the Hub is ready to provide its skills in experimental design, data processing, analysis and modelling, as well as in software, pipelines and web application development on priority projects related to COVID-19 research.