Skip to main content

Distributed Training for High Resolution Images: A Domain and Spatial Decomposition Approach...

by Jacob D Hinkle, Aristeidis Tsaris
Publication Type
ORNL Report
Publication Date

In this work we developed two Pytorch libraries using the PyTorch RPC interface for distributed deep learning approaches on high resolution images. The spatial decomposition library allows for distributed
training on very large images, which otherwise won’t be possible on a single GPU. The domain parallelism library allows for distributed training across multiple domain unlabeled data, by leveraging the domain separation architecture. Both of those libraries were tested on the Summit supercomputer at a moderate scale, and we are releasing the code for both of them.