Estimating Lossy Compressibility of Scientific Data Using Deep Neural Networks

by Jieyang Chen, David R Pugmire, Norbert Podhorszki, Scott A Klasky

Publication Type

Journal

Journal Name

IEEE Letters of the Computer Society

Publication Date

February, 2020

Page Numbers

5 to 8

Volume

Issue

View DOI Listing

Abstract

Simulation based scientific applications generate increasingly large amounts of data on high-performance computing (HPC) systems. To allow data to be stored and analyzed efficiently, data compression is often utilized to reduce the volume and velocity of data. However, a question often raised by domain scientists is the level of compression that can be expected so that they can make more informed decisions, balancing between accuracy and performance. In this letter, we propose a deep neural network based approach for estimating the compressibility of scientific data. To train the neural network, we build both general features as well as compressor-specific features so that the characteristics of both data and lossy compressors are captured in training. Our approach is demonstrated to outperform a prior analytical model as well as a sampling based approach in the case of a biased estimation, i.e., for SZ. However, for the unbiased estimation (i.e., ZFP), the sampling based approach yields the best accuracy, despite the high overhead involved in sampling the target dataset.

Estimating Lossy Compressibility of Scientific Data Using Deep Neural Networks

Abstract

Researchers

Organizations