Skip to main content

Performance analysis and optimization for scalable deployment of deep learning models for country‐scale settlement mapping ...

Publication Type
Journal Name
Concurrency and Computation: Practice and Experience
Publication Date

This paper presents a scalable object detection workflow for detecting objects (e.g. human settlement) from remotely sensed (RS) imagery. We have successfully deployed this workflow on Titan supercomputer and utilized it for the task of mapping human settlement at a country-scale. The performance of various stages in the workflow are analyzed before making it operational. The workflow implements various strategies to address issues such as sub-optimal resource utilization and long-tail effect due to unbalanced image workload, data loss due to runtime failures, and maximum walltime constraint imposed by Titan's job scheduling policy. A meanshift clustering-based static load balancing strategy is implemented which partitions the image load such that each partition contains similar sized images. Furthermore, a checkpoint-restart (CR) strategy is added in the workflow as a fault tolerance mechanism to prevent the data losses due to unforeseen runtime failures. The performance of the above-mentioned strategies is observed in various scenarios such as, node failure, exceeding walltime, and successful completion. In its current state, using this workflow, we are able to detect human settlement across 752,618 $km^2$ (entirety of Zambia) in 6 hours.