Scalable Data-Intensive Geocomputation: A Design for Real-Time Continental Flood Inundation Mapping

by Yan Liu, Jibonananda Sanyal

Publication Type

Conference Paper

Book Title

Proceedings of the Smoky Mountain Computational Sciences and Engineering Conference, SMC 2020

Publication Date

August, 2020

Publisher Location

New York, United States of America

Conference Name

Smoky Mountains Computational Sciences & Engineering Conference 2020 (SMC2020)

Conference Location

Kingsport, Tennessee, United States of America

Conference Sponsor

Oak Ridge National Laboratory

Conference Date

Aug 26, 2020 - Aug 28, 2020

Abstract

The convergence of data-intensive and extreme-scale computing enables an integrated software and data ecosystem for scientific discovery. Developments in this realm will fuel transformative research in data-driven interdisciplinary domains. Geocomputation provides computing paradigms in Geographic Information Systems (GIS) for interactive computing of geographic data, processes, models, and maps. Because GIS is data-driven, the computational scalability of a geocomputation workflow is directly related to the scale of the GIS data layers, their resolution and extent, as well as the velocity of the geo-located data streams to be processed. Unique in high user interactivity and low end-to-end latency requirements, geocomputation applications will dramatically benefit from the convergence of high-end data analytics (HDA) and high-performance computing (HPC). The application level challenge, however, is to identify and eliminate computational bottlenecks that arise along a geocomputation workflow. Indeed, poor scalability at any of the workflow components is detrimental to the entire end-to-end pipeline. Here, we study a large geocomputation use case in flood inundation mapping that handles multiple national-scale geospatial datasets and targets low end-to-end latency. We discuss benefits and challenges for harnessing both HDA and HPC for data-intensive geospatial data processing and intensive numerical modeling of geographic processes. We propose an HDA+HPC geocomputation architecture design that couples HDA (e.g., Spark)-based spatial data handling and HPC-based parallel data modeling. Key techniques for coupling HDA and HPC to bridge the two different software stacks are reviewed and discussed.

Scalable Data-Intensive Geocomputation: A Design for Real-Time Continental Flood Inundation Mapping

Abstract

Researchers

Organizations