Creating a Tools Ecosystem for Cross-Discipline Environmental Data Reuse

Show authors

Publication Type

Conference Paper

Book Title

2021 IEEE International Conference on Big Data (Big Data)

Publication Date

December, 2021

Page Numbers

3705 to 3708

Issue

Publisher Location

New Jersey, United States of America

Conference Name

IEEE International Conference on Big Data

Conference Location

Virtual (Formerly Orlando, FL), Florida, United States of America

Conference Sponsor

IEEE

Conference Date

Dec 15, 2021 - Dec 18, 2021

View DOI Listing

Abstract

Reusing data is difficult even within well-defined science communities and only gets worse when combining data from multiple communities and disciplines. Through the lens of current work on constructing an environmental epidemiological data set from multiple disciplinary sources, we demonstrate the need for a new tool ecosystem to support heterogeneous Big Data science. Extending existing community standards for schemas and/or data formats through human auditing and wrangling of the data is not feasible at scale. This work therefore suggests new approaches for the multi-disciplinary communities to build a shared tool ecosystem for big data. We discuss both the larger context of data wrangling of epidemiological data sets for novel artificial intelligence algorithms and the specific lessons from working with these multi-disciplinary data sets. Adopting a more model-driven, automatable approach promises not only better efficiency but also removes key sources of human-generated errors and promotes reuse and reproducibility of science data.

Creating a Tools Ecosystem for Cross-Discipline Environmental Data Reuse

Abstract

Researchers

Organizations