AI-Powered Discovery Uncovers 70,000 New RNA Viruses, Opening Doors to Agricultural Health Breakthroughs

Metagenomics

In a groundbreaking development, researchers have harnessed artificial intelligence (AI) to uncover over 70,000 previously unknown RNA viruses, many of which are unlike any species known to science. This discovery, detailed in a recent study published in Nature, was achieved through metagenomics—a technique that analyses genetic material from environmental samples without the need for culturing individual viruses. The application of AI in this context has opened new avenues for exploring the vast ‘dark matter’ of the RNA virus universe.

Viruses are ubiquitous microorganisms that infect animals, plants, and even bacteria, yet only a small fraction have been identified and described. There is “essentially a bottomless pit” of viruses to discover, says Artem Babaian, a computational virologist at the University of Toronto in Canada. Some of these viruses could cause diseases in people, which means that characterizing them could help to explain mystery illnesses, he says.

Previous studies have used machine learning to find new viruses in sequencing data. The latest study, published in Cell this week, takes that work a step further and uses it to look at predicted protein structures. The AI model incorporates a protein-prediction tool, called ESMFold, that was developed by researchers at Meta (formerly Facebook, headquartered in Menlo Park, California).

A common method is to look for a section of the genome that encodes a key protein used in RNA replication, called RNA-dependent RNA polymerase (RdRp). But if the sequence that encodes this protein in a virus is vastly different from any known sequence, researchers won’t recognize it. Shi Mang, an evolutionary biologist at Sun Yat-sen University in Shenzhen, China, and a co-author of the Cell study, and his colleagues went looking for previously unrecognized viruses in publicly available genomic samples.

They developed a model, called LucaProt, using the ‘transformer’ architecture that underpins ChatGPT, and fed it sequencing and ESMFold protein-prediction data. They then trained their model to recognize viral RdRps and used it to find sequences that encoded these enzymes — evidence that those sequences belonged to a virus — in the large tranche of genomic data.

Using this method, they identified some 160,000 RNA viruses, including some that were exceptionally long and found in extreme environments such as hot springs, salt lakes and air. Just under half of them had not been described before. They found “little pockets of RNA virus biodiversity that are really far off in the boonies of evolutionary space”, says Babaian.

This discovery not only expands our understanding of viral diversity but also underscores the potential of AI in revolutionising virology and related fields. By identifying these novel viruses, scientists can better comprehend viral evolution, ecology, and their potential impacts on human health and agriculture. The integration of AI in metagenomic analyses marks a significant step forward in the quest to map the vast and largely uncharted world of RNA viruses.

Share
MEDIA PACK 2024
Discover the Opportunities