Using high-performance computing to study co-evolutionary networks in poplar
A cross-disciplinary research team at Oak Ridge National Laboratory is using supercomputing to create an unprecedented view of the 3D interactions among components of the cellular machinery in Populus trichocarpa (black cottonwood), a fast-growing perennial tree that shows promise as a low-cost sustainable feedstock for biofuels production.
The project, titled “Co-evolutionary Networks: From Genome to 3D Proteome,” integrates biology, big data, mathematics, statistics, artificial intelligence (AI), and high-performance computing to predict protein structures and molecular binding events that lead to plant characteristics known as phenotypes.
“To our knowledge, nobody has tried to do this,” said project lead Dan Jacobson. “To take predictive information and effectively try to build the 3D model of so many interactions in the cell. No one has been crazy enough to try this.”
This massive set of new data will increase fundamental understanding of metabolism, identify potential chemical intervention targets for enhancing and disrupting plant systems, and inform future research initiatives—providing a virtual method to hone very directed hypotheses for evaluation in physical experiments.
To achieve these goals, the research team will draw on existing data and build a series of predictive models to integrate complex layers of information. Team members include Jacobson, Julie Mitchell, Xiaolin Cheng, Gerald Tuskan, and Timothy Tschaplinski of ORNL’s Biosciences Division as well as Jeremy Smith, University of Tennessee-ORNL Governor’s Chair for Molecular Biophysics; Wayne Joubert of the National Center for Computational Sciences at ORNL; and Stephen DiFazio of West Virginia University.
Connecting genotypes to phenotypes
Matching variations in the genome with differences in physical, molecular, chemical, microbial and other phenotypes is the underlying goal of the project. Rather than examining that scientific problem at a single-gene-to-single-effect level, the research team is studying the complex interactions of multiple genes.
These combinatory effects are much closer to real-world scenarios of how proteins (genes) at the cellular level influence phenotypes for an organism and even impact how that organism interacts with its environment, says Jacobson.
To understand these multi-gene interactions, the research team is starting with a wealth of poplar phenotype data generated in previous studies conducted by the laboratory through the Plant Microbe Interface Project and the BioEnergy Science Center. This data will continue to grow with contributions from the new Center for Bioenergy Innovation led by ORNL.
With this incredibly dense genomic map, scientists have pinpointed 28 million places, known as single nucleotide polymorphisms (SNPs), where there is variation across the population. Using a statistical technique called genome-wide association (GWAS) combined with AI and computational models, the team is associating SNPs with variations in phenotype.
Identifying co-evolutionary networks
Local correlations between groups of genes that are inherited together as blocks of DNA are common. Jacobson and his team are examining non-local correlations where genes are on completely different parts of the genome, but are highly correlated to each other across the population.
The researchers are working under the hypothesis that these physically distant, but associated genes represent a co-evolutionary signature and are involved in a common function.
Understanding these interactions is important to future efforts targeting selective phenotype changes. Modifying a gene that would improve the digestibility of poplar for easier biofuels production, for instance, would not be successful if that change caused the plant to be susceptible to a pathogen.
Predicting protein structures
Once the team identifies genes involved in common functions, the next step is to visualize their physical structure and interactions.
Few protein crystal structures are known for poplar, but those structures are solved for other species. Through a computational technique known as homology modeling, the researchers will find sequences in other species that carry out similar functions to the proteins of interest in poplar. Using the solved structures in other species as a template, the team can predict the 3D structure of individual poplar proteins.
With the resulting protein structures refined through molecular dynamics simulations, the researchers will combine the systems biology information and the data on coevolutionary networks to predict protein-to-protein interactions. These findings will be used to identify probable binding sites for protein-protein docking with the goal of determining the structural models of putative protein complexes.
Increasing fundamental understanding
Armed with a library of approximately 300 million small molecule structures, the research team will examine small molecule docking against the poplar protein structures. This information will increase fundamental understanding of metabolism and potentially provide chemical intervention targets for enhancing and disrupting plant systems.
Testing these targets and other hypotheses through physical experimentation will inform and improve the simulations and could yield new scientific discoveries.
This research is supported by the DOE Office of Science through an allocation of 70 million hours on the Titan supercomputer as part of the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. Data from ORNL Laboratory Directed Research Fund projects as well as the Plant Microbe Interface Project and BioEnergy Science Center, funded by the DOE Office of Science Biological and Environmental Research program, are integral to this project.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit http://science.energy.gov.