Abstract
General-purpose library application programming interfaces (APIs) for self-describing hierarchical scientific data storage, such as the HDF5 and NetCDF libraries, are traditionally of runtime nature. Runtime errors for entry existence and data types are typically caught later in the development process of higher-level application-specific APIs. In this paper, we propose exploiting modern C++ metaprogramming features to add compile-time type-safety to improve the interaction with a well-defined metadata-rich scientific schema in domain-specific hierarchical datasets. We tackle two aspects of common use: (i) direct data access, (ii) flexible “in-memory” index models for efficient search and data processing. The proposed APIs use C++17’s template type auto deduction features, C++11’s enum class for type-safety and C-style preprocessor macros for generative templated code. We showcase the pros and cons of our initial work on the standard NeXus schema used for annotating and storing experimental neutron scattering data at several facilities around the world on top of HDF5. Extendable compile-time type-safe APIs are a desirable feature that could be indexed by any modern integrated development environment (IDE). Hence, such APIs can help ease the learning curve for domain scientists using a less error-prone software interaction to enhance the findability of their data without resorting to a domain-specific language (DSL).