A Vision for Managing Extreme-Scale Data Hoards

Show authors

Publication Type

Conference Paper

Book Title

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Publication Date

July, 2019

Page Numbers

1806 to 1817

Conference Name

International Conference on Distributed Computing Systems (ICDCS 2019)

Conference Location

Dallas, Texas, United States of America

Conference Sponsor

IEEE

Conference Date

Jul 7, 2019 - Jul 9, 2019

View DOI Listing

Abstract

Scientific data collections grow ever larger, both in terms of the size of individual data items and of the number and complexity of items. To use and manage them, it is important to directly address issues of robust and actionable provenance. We identify three key drivers as our focus: managing the size and complexity of metadata, lack of a priori information to match usage intents between publishers and consumers of data, and support for campaigns over collections of data driven by multi-disciplinary, collaborating teams. We introduce the Hoarde abstraction as an attempt to formalize a way of looking at collections of data to make them more tractable for later use. Hoarde leverages middleware and systems infrastructures for scientific and technical data management. Through the lens of a select group of challenging data usage scenarios, we discuss some of the aspects of implementation, usage, and forward portability of this new view on data management.

A Vision for Managing Extreme-Scale Data Hoards

Abstract

Researchers

Organizations