Skip to main content
Project

National Microbiome Data Collaborative

Topic: Clean Energy

The National Microbiome Data Collaborative (NMDC), is a new Department of Energy initiative led by Lawrence Berkeley National Laboratory (LBNL), in partnership with Oak Ridge (ORNL), Los Alamos (LANL), and Pacific Northwest (PNNL) national laboratories. The NMDC will leverage DOE’s existing data-science resources and high-performance computing systems to develop a framework that facilitates more efficient use of microbiome data for applications in energy, environment, health, and agriculture.

Nearly every ecosystem and organism on Earth hosts a diverse community of microorganisms – its microbiome. Yet we know little about the functions of individual microbes, let alone how they interact with each other, their hosts, or their environments, and how their activity varies over time or in response to perturbations. The past decade has seen tremendous advances in genome and metagenome DNA-sequencing technologies, which has led to an unprecedented volume of microbiome data being generated. However, further progress in the field has been hindered by the lack of computational infrastructure for processing and performing integrative analyses of these and other microbiome-relevant data.

The NMDC will tackle this data integration challenge by developing a community-centric framework based on large-scale, collaborative partnerships that draw on the capabilities, expertise, and resources of four DOE national laboratories. The guiding principles at the initiative’s core are: making data findable, accessible, interoperable, and reusable (FAIR); connecting data and compute resources; and community engagement that supports open science and shared ownership.

Capabilities not currently available to the microbiome research community that NMDC will enable include:

  • Aggregating and viewing both taxonomic and functional profiles of unassembled and assembled metagenome sequence data to gain new insights into microbiome composition and function.
  • Accessing, analyzing, and integrating multi-omics data sets (metagenome, metatranscriptome, metaproteome, metabolome, and environmental data) to discover community dynamics, metabolic networks, and other microbe-microbe, microbe-host, and microbe-environment interactions.
  • Accelerating search through linked data using existing and enhanced ways to describe microbiome data sets, diversifying the sample space and depth for new discoveries

 

Phase One

The first phase of the project, a 27-month pilot, will focus on four aims: designing metadata standards; designing and deploying data-processing workflows; facilitating data integration and access; and delivering multiple opportunities for community engagement. Berkeley Lab houses several key resources for this pilot phase, most notably two data analysis platforms (the Integrated Microbial Genomes & Microbiomes and DOE Systems Biology Knowledgebase), data provided by the JGI, and data standards through participation in the Gene Ontology Consortium. Importantly, Berkeley Lab will lead the first phase of NMDC with a strong commitment to execute all related activities according to our commitment to diversity, equity, inclusion, and accountability.

Aim 1 leads Alison Boyer (ORNL), Lee Ann McCue (PNNL), and Chris Mungall (Berkeley Lab) will oversee the application of existing ontology mapping tools and curation resources to automate annotation of metadata to comply with FAIR principles. Aim 2 leads Patrick Chain (LANL) and Shane Canon (Berkeley Lab) will guide the design of workflows that leverage high-performance computing systems to generate integrated, interoperable, and reusable microbiome data. Aim 3 lead Kjiersten Fagnan (Berkeley Lab) will spearhead the development of a scalable infrastructure and web-based graphical user interface to enable scientists to explore and interact with the NMDC data. Stanton Martin (ORNL) will provide guidance and support across Aims 1-3 as FAIR strategic team lead.

Aim 4 lead Elisha Wood-Charlson (Berkeley Lab) is responsible for the NMDC’s communication strategy for raising community awareness and engagement. Upcoming events include an October 2019 workshop on Merging Ontologies, a December 2019 American Geophysical Union (AGU) session on Creating Data Synchronicity Across Earth Microbiome Research (FAIR data), and a related session at the Ocean Sciences Meeting in February 2020.

For more information, see: https://newscenter.lbl.gov/2019/08/13/community-driven-data-science/