The Earth System Grid Federation, an international multi-institutional initiative that gathers and distributes data for top-tier simulations of the Earth’s climate, is preparing a series of upgrades that will make using the data easier and faster while improving how the information is curated.
A new project called ESGF2, led by the U.S. Department of Energy’s Oak Ridge National Laboratory in collaboration with Argonne and Lawrence Livermore national laboratories, is part of international efforts to upgrade the global data system that is integral to simulations of future climate. These widely respected simulations are made by scientists working with the Coupled Model Intercomparison Projects for the World Climate Research Programme.
“ESGF data are about the future of life on Earth,” said Forrest Hoffman, lead for the DOE-funded ESGF2 project and the Computational Earth Sciences group at ORNL. “By providing scientists easy access to the full collection of international models, ESGF enables them to construct the very best understanding about the potential future trajectories of our climate.”
A key ESGF mission is to support the data needs of scientists conducting global climate change research, and notably those who prepare the United Nations Intergovernmental Panel on Climate Change’s comprehensive climate assessments released every six to seven years. ESGF Earth system data underpin IPCC landmark reports such as the recent Sixth Assessment Report, AR6, and its working group findings. The data also inform IPCC special reports focused on climate vulnerabilities, adaptation scenarios and mitigation strategies.
Another important aspect of the ESGF’s mission is to ensure that scientific investigation is transparent, collaborative and reproducible, given its direct impact on worldwide climate research and potential use in government and commercial decision-making.
“Almost all of the Earth system model data that go into the IPCC reports are stored in the ESGF,” said Hoffman. These data are provided by more than 50 climate research institutions around the world. “The federation is a unique consortium of community-minded institutions that aims to get data into the hands of the tens of thousands of researchers and stakeholders who analyze it and compare it with observational data to constantly update our best projections of the future.”
In the new DOE ESGF2 project, computational scientists are working to improve data discovery, access and storage in direct collaboration with international federation partners. Their work will leverage the latest software tools, cloud computing resources, the world’s most powerful supercomputers and DOE’s Energy Sciences Network, or ESnet. ESnet currently enables 100 gigabit-per-second transfer rates among national laboratories and connections to other national and international networks, universities and research centers. An upgrade launched this fall will boost ESnet transfer rates to as much as 400 Gbps.
“Working with federation partners in the United Kingdom, Germany, France, other European Union nations, Australia and other countries, we will develop and deploy a modernized and cyber-secure system for providing access to model output data to the global scientific community,” Hoffman said.
The federation operates a network of large computer nodes hosted in the United States and 17 other countries, functioning as one global data archive. ORNL, ANL and LLNL are collaborating with these partners to improve the reliability and scalability of the system, providing a smooth data replication process that ensures the broader scientific community has access to data from all ESGF sites. ORNL and ANL are hosting a dual backup of the more than 8 petabytes (and counting) of the most popular subset of ESGF data, taking advantage of the world-class computing systems operated at the labs.
Developing robust user interfaces and secure, reliable archives
The multiyear ESGF2 project has already replicated existing data and is providing the storage and computational services needed to deliver data for the user community while it builds out new infrastructure and services to augment ongoing international federation efforts. ESGF partners created a roadmap in 2020 to guide the development work of all federation partner activities.
ORNL brings substantial experience with big data centers and large-scale modeling and simulation to its leadership role in the DOE ESGF2 project. The lab is home to the Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility whose Frontier exascale computing system was recently ranked as the world’s fastest, as well as the Climate Change Science Institute, which brings together data experts, modelers and experimentalists to accelerate understanding of climate change and its impacts.
“ORNL is in the unique position of knowing about big data and also knowing about climate and serving as host to very large data centers and the interfaces that make that information easily accessible by scientists around the world, and we are excited to be adding new resources to ESGF,” Hoffman said.
The Argonne Leadership Computing Facility, a DOE Office of Science user facility, lends its unique capabilities, as well as the Globus research data management system, operated for the research community by the University of Chicago, to the federation. Globus services will be used in the ESGF2 deployment for authentication and for data indexing, access and replication.
“The terabytes and petabytes generated by the climate models of today require new approaches to data management and analysis,” said Ian Foster, ANL lead for the project. “We will enable not only faster download of data subsets, but also previously infeasible data analyses on ANL and ORNL supercomputers.”
LLNL also brings a wealth of high-performance computing, a long history of leadership in model intercomparison community activities through its Program for Climate Model Diagnosis and Intercomparison and experience in leading U.S. development of ESGF.
“The upgrades will make it easier and faster for users to access the data that can help us better understand what climate will look like in the future,” said Sasha Ames, LLNL lead for ESGF2.
The ESGF2 project is sponsored by the Biological and Environmental Research program within DOE’s Office of Science. ESGF is co-funded by DOE, the National Aeronautics and Space Administration, the National Oceanic and Atmospheric Administration and the National Science Foundation in the U.S., by the Infrastructure for the European Network for Earth System Modelling in Europe and by numerous international research institutions and academic partners.
UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science. — Stephanie Seay
This story has been updated with additional detail.