This article first appeared on the U.S. Department of Energy Office of Science website. As part of the Year of Open Science, the DOE Office of Science is highlighting its Public Reusable Research (PuRe) Data Resources. The PuRe Data Resources are authoritative sources that make data easier to find, access, and reuse across the broader scientific community. This article highlights the Atmospheric Radiation Measurement (ARM) Data Center at DOE’s Oak Ridge National Laboratory.
From the Arctic to the Amazon, understanding the atmosphere is key to understanding our climate and other Earth systems. The ARM Data Center collects and manages global observational and experimental data amassed by the Department of Energy Office of Science’s Atmospheric Radiation Measurement user facility. For the past 30 years, it has been making this data accessible to scientists around the world who study and model the Earth’s climate.
The ARM Data Center gathers and curates some 50 terabytes of data per month. This data comes from more than 460 instruments located in climate-critical locations worldwide. The data center processes and packages the information from these instruments into over 11,000 distinct data products. These data products include daily records of temperature, wind speed, humidity, cloud cover, atmospheric particles called aerosols, and dozens of other atmospheric properties. These properties are critically important to understanding and predicting the processes that drive climate and weather.
Scientists in 37 countries freely use these data products. They have produced more than 400 academic papers and presentations in the last year alone. The massive data archives at ARM comprise more than 4,000 terabytes of information. That number is expected to climb to 5,500 terabytes by the end of 2023.
The scientific community has high regard for ARM’s data products. They are routinely used by both US and international scientists in their research. They have also been cited in reports by the United Nations Intergovernmental Panel on Climate Change, which are used to inform governments worldwide about ongoing and future climate risks.
Based at DOE’s Oak Ridge National Laboratory, the ARM Data Center is supported by high-performance computing clusters. The Cumulus cluster at ORNL provides 16,000 processing cores to ARM users. That computing power enables advanced model simulations, big-data storage, big-data analytics, and machine learning.
The ARM Data Center is continuously modernizing its computing architecture and software to give users a seamless experience. The Data Discovery platform provides a menu of user-friendly tools for finding, ordering, and analyzing data. Additionally, the new ARM Data Workbench is under development. It will provide users with an integrated data-computing ecosystem with open-source tools. These tools can be modified and shared to give users even more ways to easily find and analyze data.
In addition to being designated as a DOE Office of Science PuRe Data resource, the ARM Data Center has been recognized as a CoreTrustSeal repository. It is also a member of the World Data System. The center uses the FAIR data principles of Findability, Accessibility, Interoperability and Reusability in its data management practice. Following the FAIR principles helps ensure that data are findable and useful for repeatable research. These principles are especially important as scientists increasingly rely on data digitization and artificial intelligence.
UT-Battelle manages ORNL for the DOE Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit the Office of Science website. —Michael Cooke, senior technical advisor for the DOE Office of the Deputy Director for Science Programs