Supercomputing and Computation
A Distribution Oblivious Scalable Approach for Large-Scale Scientific Data ProcessingJune 12, 2013
- Runtimes of scientific data processing (SDP) methods vary depending on data distribution characteristics, even when data size remains constant. A central issue in the scalable design of efficient large-scale SDP algorithms is the sensitivity of their execution times to data distribution.
- To gainfully utilize today’s massive computing infrastructures, a major challenge is to develop large-scale data processing capabilities that are independent of the underlying data distribution characteristics.
- View the organization of higher level data analysis applications as a hierarchical stack of functional layers.
- The project focuses on the deepest layer of this stack which directly interacts with the supercomputing hardware and consists of parallel spatial neighborhood/region queries common to most higher-level SDP applications.
- Based on distribution-independent parallel spatial data structures, develop core parallel region query algorithms whose runtimes are data distribution-independent.
- Will enable design of efficient scalable SDP frameworks with data distribution-independent performance characteristics. As proof-of-principle, this new capability will be used to analyze massive atom probe tomography datasets in unprecedented runtimes.