Supercomputing and Computation


A Distribution Oblivious Scalable Approach for Large-Scale Scientific Data Processing

Problem Statement:

  • Runtimes of scientific data processing (SDP) methods vary depending on data distribution characteristics, even when data size remains constant. A central issue in the scalable design of efficient large-scale SDP algorithms is the sensitivity of their execution times to data distribution.
  • To gainfully utilize today’s massive computing infrastructures, a major challenge is to develop large-scale data processing capabilities that are independent of the underlying data distribution characteristics.

Technical Approach:

  • View the organization of higher level data analysis applications as a hierarchical stack of functional layers.
  • The project focuses on the deepest layer of this stack which directly interacts with the supercomputing hardware and consists of parallel spatial neighborhood/region queries common to most higher-level SDP applications.
  • Based on distribution-independent parallel spatial data structures, develop core parallel region query algorithms whose runtimes are data distribution-independent.


  • Will enable design of efficient scalable SDP frameworks with data distribution-independent performance characteristics. As proof-of-principle, this new capability will be used to analyze massive atom probe tomography datasets in unprecedented runtimes.



We're always happy to get feedback from our users. Please use the Comments form to send us your comments, questions, and observations.