Skip to main content

John Lagergren: Cultivating AI for a plant science revolution

Man in blue shirt and grey pants holds laptop and poses next to a green plant in a lab.
John Lagergren is creating neural networks to quickly analyze the vast data acquired by automated imaging stations in the Advanced Plant Phenotyping Laboratory, or APPL, at Oak Ridge National Laboratory. Credit: Carlos Jones/ORNL, U.S. Dept. of Energy

At the crossroads of mathematics and engineering, John Lagergren is building powerful artificial intelligence tools to dramatically advance the pursuit of hardy crops for everything from biofuels and biomaterials to natural carbon storage. The work fits well with his career goal of pursuing impactful science.

Lagergren, a staff scientist in Oak Ridge National Laboratory’s Plant Systems Biology group, is using his expertise in applied math and machine learning to develop neural networks to quickly analyze the vast amounts of data on plant traits amassed at ORNL’s Advanced Plant Phenotyping Laboratory, or APPL. Neural networks are a type of machine learning that mimic the structure of the human brain, performing tasks such as pattern recognition and decision-making when well trained.

The greenhouse-like APPL lab contains one of the most diverse suites of imaging capabilities in the world dedicated to the automated collection of measurements as plants grow, including size, biomass accumulation, photosynthetic activity, water and nitrogen content, stress response and biochemical composition. The high-resolution data collected by APPL as plants move through day and night allow scientists to quickly identify which genes underpin traits of interest and assess whether genetic modifications made to plants result in improved physical characteristics.

“Measuring all of those traits by hand is a time-consuming, miserable process that I’ve done myself in the field and greenhouse,” Lagergren said. “What we’re working toward instead is a system using AI and automation to take images and extract biologically meaningful traits for genomic analysis, and doing so in a way that is faster, more accurate, and collects data that a human can’t see.”

APPL uses a range of imaging modes from hyperspectral to 3D plant modeling to generate as much as 1 petabyte per year of raw data. Creating an AI model to quickly parse that information is a unique challenge owed to the diversity of APPL experiments that may involve different types of plants being evaluated for various traits such as tolerance to nutrient or drought stress, or to grow faster and larger. 

“Coming up with algorithms that generalize across all of those different conditions is non-trivial,” Lagergren said.

Lagergren is at ease in the world of big data and AI, having first come aboard at ORNL as a postdoctoral researcher in the Computational and Predictive Biology group. There, he participated in a variety of projects, including global-scale, long-term climate analysis to assess potential impacts on food, bioenergy and pandemics; modeling to predict and preempt animal-to-human zoonotic virus spillover events; and most relevant to his work today, a project training neural networks to analyze the form and vein structure of poplar tree leaves using images captured in the field. The poplar tree is a key bioenergy crop being advanced at ORNL.

John Lagergren created a leaf-scanning algorithm that can speed phenotyping work by quickly analyzing scanned images of plants to identify important traits. Credit: John Lagergren/ORNL, U.S. Dept. of Energy

For APPL, Lagergren’s goal is a single, large-scale machine learning model that’s in line with the general AI trend of creating fewer but much larger, more capable models. “ChatGPT is an example of one such foundation model, where you essentially have downloaded the entire Internet and pre-trained a model with an appropriate depth of knowledge to serve a variety of uses,” Lagergren said. Similarly, the neural network he’s building will be trained on APPL data and can be fine-tuned for multiple downstream scientific use cases.

“To pull off a project like this you need two main ingredients: a lot of data and a lot of computing power. We check both those boxes at ORNL,” Lagergren said. “APPL has been producing huge amounts of data over the last two years, and ORNL has the fastest supercomputer for open science in the world: Frontier.” The supercomputer, the world’s first exascale system for open science, is part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility.

APPL isn’t the only project keeping Lagergren busy these days. He’s also working on AI models that use climate and soil data to predict soil organic carbon in a project for the DOE Bioenergy Technologies Office. The project’s goal is to predict the potential for natural carbon storage in any given location in the United States to prioritize areas conducive to growing bioenergy crops. In a similar project, Lagergren is creating a model using climate data, topography information and soil data to predict suitable areas for cover crops that can serve as additional bioenergy feedstocks, preventing soil erosion and providing more income to farmers.

Finding the right kind of math

It might come as a surprise that Lagergren hated math in high school and struggled in classes such as trigonometry. However,  as an undeclared freshman at East Tennessee State University, or ETSU, he explored a variety of subjects and signed up for an illuminating calculus class. 

“That’s when the light bulb went off that this math that I hated so much, it’s not just arbitrary or there to make you miserable — it’s useful in the real world,” Lagergren said. “You can use it to predict how things change over time or space. The experience began to satisfy a pull I felt toward engineering and problem-solving, where I didn’t know math could make a difference.”

Man in blue shirt standing to the left of the screen is posted next to a sign that says advanced plant phenotyping laboratory, holding a plant.
John Lagergren with one of the imaging stations in the Advanced Plant Phenotyping Laboratory, or APPL, at ORNL. Credit: Carlos Jones/ORNL,U.S. Dept. of Energy

Lagergren went on to earn his bachelor’s degree in applied mathematics at ETSU, and both a master’s and a doctorate in the subject at North Carolina State University. His interest in machine learning took off as a graduate student, and continued during his  internships helping a private company grow its AI capabilities.

When the time came to pursue a postdoctoral position, Lagergren was already familiar with ORNL as a leader in the field of computational science and supercomputing and was interested in the lab’s biosciences research. “When you’re talking about science and Tennessee, ORNL is a core place, Lagergren said.” He came aboard as a postdoc in 2021 and was hired into his current position two years later. 

Lagergren touts the resources and expertise at the lab for helping drive his scientific success. “The data that APPL continues to generate, and having the computing to ingest and crunch big data with Frontier to learn patterns are definitely exciting,” he said. “Everyone I’ve had the privilege of working with here has been great. There’s just so much interesting stuff to talk about, and with people from different disciplines with fantastic expertise. It’s different from being at a university or working in industry — ORNL is in a space where you have incredible resources and are purpose-driven to solve important problems with consequences for the nation and the world.”

Lagergren is enthusiastic about the work ahead as he and his colleagues create a new paradigm for plant science with APPL’s capabilities. 

Slicing big APPL data for better plants

“I’m excited about the time series aspect — that’s the potential APPL holds. You can go out and conduct a field study or work in the greenhouse and get a snapshot in time, but APPL’s collecting data every day,” Lagergren said. “You’re watching plants at different growth stages and in variable conditions over time.”

“Being able to distill high-dimensional image traits into closed-form equations that you can then use to make time series predictions and estimate parameters — those parameters alone can become targets for genomic analysis. We’re essentially driving new mathematical models that don’t yet exist of how plants change, how their roots change, how those things interact, the role of the environment and different nutrient conditions. You gain a better understanding of the dynamics that drive response and resilience.”

One of the ultimate goals, Lagergren said, “is to transform bioenergy crops to grow in places where they don’t want to grow, because you can’t take over corn fields and other food crop areas — that is not an option. So, we want plants that will thrive in harsh environments or in the presence of stresses such as disease or pests. The challenge is to identify the genetic changes needed to make plants more resilient, to grow more quickly with fewer resources and with better biomass composition. We want to enable a bioenergy future where we can also offset the carbon we’re putting into the atmosphere.”

His advice for young scientists? “Emphasize understanding rather than just memorization,” Lagergren said. “It’s easy to memorize the material as an undergrad and regurgitate it on a test. But the real world isn’t that simple. You need to understand the fundamentals so when the parameters change and you’re presented with a new challenge, you can draw from that fundamental understanding to solve the problem in front of you.”

That approach is one that works particularly well in deploying machine learning to solve problems, he noted. “It’s better to have a problem that you’re very passionate about, to learn as much as you can, dive deep and pick up the nuances. You learn a lot more that way compared to a very shallow understanding over a broad range.”

Lagergren said he’s personally motivated by two foundational drivers in his work: solving complex puzzles that have a lot of variables and don’t follow the standard rules, while addressing solutions for real-world problems. “That’s the nice thing about working at ORNL, knowing that we are trying to solve world-class challenges in areas such as bioenergy, plant resilience, and zoonotic spillovers. I’m not just solving some obscure mathematical problem in a vacuum, but one with actual consequences when it’s done right.”

UT-Battelle manages ORNL for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit