Skip to main content

Creating a Tools Ecosystem for Cross-Discipline Environmental Data Reuse...

Publication Type
Conference Paper
Journal Name
Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD)
Book Title
2021 IEEE International Conference on Big Data (Big Data)
Publication Date
Page Numbers
3705 to 3708
Publisher Location
New Jersey, United States of America
Conference Name
IEEE International Conference on Big Data
Conference Location
Virtual (Formerly Orlando, FL), Florida, United States of America
Conference Sponsor
Conference Date

Reusing data is difficult even within well-defined science communities and only gets worse when combining data from multiple communities and disciplines. Through the lens of current work on constructing an environmental epidemiological data set from multiple disciplinary sources, we demonstrate the need for a new tool ecosystem to support heterogeneous Big Data science. Extending existing community standards for schemas and/or data formats through human auditing and wrangling of the data is not feasible at scale. This work therefore suggests new approaches for the multi-disciplinary communities to build a shared tool ecosystem for big data. We discuss both the larger context of data wrangling of epidemiological data sets for novel artificial intelligence algorithms and the specific lessons from working with these multi-disciplinary data sets. Adopting a more model-driven, automatable approach promises not only better efficiency but also removes key sources of human-generated errors and promotes reuse and reproducibility of science data.