Skip to main content

Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models

Publication Type
Journal Name
Genome Biology
Publication Date
Page Number

The COVID-19 pandemic necessitates a mechanistic understanding of the worldwide spread of SARS-CoV-2 and diligent tracking of ongoing mutagenesis in order to plan robust strategies to confine its transmission. The availability of large numbers of sequences and their dates of transmission provide an unprecedented opportunity to analyze evolutionary adaptation in novel ways. Addition of high-resolution structural information reveals functional implications of these processes at the molecular level. Here we identify 1,087 haplotypes from 9,294 full-length genomes of SARS-CoV-2 and model their relative success based on half-life, frequency in the host population, and geographic distribution. We identify several mutations that are likely compensatory adaptive changes that allowed for rapid expansion. Contrary to previous reports, we find that the Asp614Gly in the spike glycoprotein (S) was likely deleterious for transmission and the addition of the subsequent mutation in the RNA-dependent RNA polymerase nsp12 led to the precipitous spread of the virus. We find a similar pattern from two mutations in the nsp13 helicase that allowed for the adaptation of the virus to the Pacific Northwest of the USA. Structural analyses support our evolution-based results and provide mechanistic understandings of the processes. Finally, we use an Explainable-Artificial Intelligence algorithm to identify a mutational hotspot in the signal recognition sequence of the spike glycoprotein, which may have implications for tissue or cell-specific expression of the virus. These results provide valuable insights for the development of drugs or vaccines to successfully combat the current and future pandemics.