Skip to main content

Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++...

by William F Godoy, Aditi Malviya, Steven E Hahn
Publication Type
Conference Paper
Book Title
Responsible Data Science
Publication Date
Page Numbers
191 to 204
Publisher Location
Conference Name
7th International Conference on Data Science and Engineering (ICDSE 2021)
Conference Location
Patna (Virtual Event), India
Conference Sponsor
Indian Institute of Technology Patna, Queen's University Belfast, Cochin University of Science and Technology
Conference Date

General-purpose library application programming interfaces (APIs) for self-describing hierarchical scientific data storage, such as the HDF5 and NetCDF libraries, are traditionally of runtime nature. Runtime errors for entry existence and data types are typically caught later in the development process of higher-level application-specific APIs. In this paper, we propose exploiting modern C++ metaprogramming features to add compile-time type-safety to improve the interaction with a well-defined metadata-rich scientific schema in domain-specific hierarchical datasets. We tackle two aspects of common use: (i) direct data access, (ii) flexible “in-memory” index models for efficient search and data processing. The proposed APIs use C++17’s template type auto deduction features, C++11’s enum class for type-safety and C-style preprocessor macros for generative templated code. We showcase the pros and cons of our initial work on the standard NeXus schema used for annotating and storing experimental neutron scattering data at several facilities around the world on top of HDF5. Extendable compile-time type-safe APIs are a desirable feature that could be indexed by any modern integrated development environment (IDE). Hence, such APIs can help ease the learning curve for domain scientists using a less error-prone software interaction to enhance the findability of their data without resorting to a domain-specific language (DSL).