Skip to main content
Project

A Generative Pre-trained Transformer for Genomic Photosynthesis

Project Details

Principal Investigator
Funding Source
Office of Science

Training AI to read the language of biology

Scientists at ORNL are using AI to optimize plant photosynthesis

Bridging Genes to Traits with a Unified AI Framework

Photosynthesis depends on thousands of genes, proteins, and environmentally responsive processes working together. This complexity makes it difficult for traditional models to predict how genetic changes affect plant performance.  GPTgp solves this challenge by using artificial intelligence to learn from multiple types of biological data at once.

The model integrates DNA sequences, gene expression, protein structures, and measurements of plant traits like gas exchange and hyperspectral imaging. Using a transformer-based architecture, GPTgp brings these data into a shared representation space, allowing researchers to uncover predictive links between genes and photosynthetic traits. The capability will enable scientists to prioritize genetic changes to test before conducting lengthy field trials.

Conceptual image showing corn crops transitioning to an arid landscape, with molecular models and a rising bar chart
Elite crops capture only half the carbon dioxide that nature's top performers achieve—a gap GPTgp aims to close. Credit: Andy Sproles/ORNL, U.S. Dept. of Energy.

Unlocking Nature's Untapped Potential

Even after decades of crop improvement, today’s most productive crops photosynthesize with only about half the productivity of what some plants achieve in nature. The most elite cultivars can reach photosynthesis rates of around 30 μmol CO₂ per square meter per second, but desert-adapted species like Amaranthus palmeri can exceed 70 μmol CO₂ per square meter per second in conditions with high light, warm temperatures, and ample water. 

This gap represents an enormous opportunity to improve crop productivity; however, photosynthesis operates across many levels—from molecules to leaves to whole plant canopies—and includes complex feedback loops and trade-offs, making simple improvements difficult to predict. For example, a genetic change that boosts yield in one species may reduce growth in another. Predicting these outcomes requires more advanced, integrative tools.

Young green plant growing upward, overlaid with DNA strands, molecular labels, weather maps, and digital data graphics
Compress the biodesign cycle from years of trial and error to months of targeted validation with GPTgp. Credit: Andy Sproles/ORNL, U.S. Dept. of Energy.

Photosynthesis as Language

GPTgp draws on a fundamental rule in biology: that photosynthesis is shaped by both genes and the environment. This makes it possible to apply large language model concepts to biological systems. 

In GPTgp, the elements of DNA sequences act like words in a sentence, preserving meaningful patterns such as codon usage and regulatory motifs. Similarly, protein structures, gene expression, scientific images, environmental conditions, and physical measurements all become part of a shared biological vocabulary. By learning this “language of photosynthesis,” GPTgp has the potential to become the first AI model capable of “reading” photosynthetic biology.

A researcher analyzes digital data overlays the image also shows a crop field with a drone flying overhead
GPTgp translates genetic sequences from diverse plant species into a shared biological vocabulary, learning the language of photosynthesis. Credit: Andy Sproles/ORNL, U.S. Dept. of Energy.

Using AI to Predict Plant Performance

GPTgp is designed to accelerate the design-build-test cycle in plants. Researchers can screen candidate genes, alleles, and engineering strategies computationally before committing to costly multi-year field trials and can prioritize variants most likely to improve real-world photosynthetic performance across different species and environments. 

The model also supports learning across pathways and taxa, where insights from well-studied model organisms can inform engineering in bioenergy crops. By generating predictions for new genotypes and conditions, GPTgp can serve as an intelligent assistant for the next generation of plant scientists and breeders working toward more productive and resilient food and energy systems.

Accelerating Discovery Science

The GPTgp project is part of the Genesis Mission—DOE’s bold new endeavor to build the world’s most powerful scientific platform to accelerate discovery science, strengthen national security, and drive energy innovation. GPTgp is supported by the DOE Biological and Environmental Research program. 

Lianhong Gu
Contact
Distinguished Research Scientist
8652415925 | LIANHONG-GU@ORNL.GOV