Scientists have recently supplemented our knowledge of the human genome by detecting thousands of "long" variants using an original technique.
The DNA variations that interest most scientists are very precise. A change to a single nucleotide (a DNA building block) can radically alter the nature of the protein that is ultimately produced. Detecting these variations can be a complicated task: the usual techniques involve sequencing the entire genome of interest and then comparing it to a genome of reference. Such a large quantity of data may contain errors, and the strategies generally used offer only an approximate estimate of the variations.
In 2021, to address these issues, scientists in the Sequence Bioinformatics team at the Institut Pasteur, led by Rayan Chikhi, developed an entirely new algorithm capable of pinpointing all the positions where variations are likely to occur in a given genome. Computer methods can then be used to examine each position and clearly identify the variations.
In a study published on December 22, 2022 in the journal Nature Methods, the scientists applied this algorithm to analyze "long" variants that are particularly difficult to detect, comprising hundreds of consecutive nucleotides. "The major breakthrough with our method is the detection of 10% more long variants than all other methods. 10% may seem small, but it represents a huge leap forward at whole-genome scale, as it concerns thousands of variants that were previously unknown," explained Rayan Chikhi as he described the study.
Denti, L., Khorsand, P., Bonizzoni, P., Hormozdiari, F., & Chikhi, R. (2022). Improved structural variant discovery in hard-to-call regions using sample-specific string detection from accurate long reads. Nature Methods (December 22, 2022)