Abstract
Filtered Back-Projection (FBP) is a fundamental compute intense algorithm used in tomographic image reconstruction. Cone-Beam Computed Tomography (CBCT) devices use a cone-shaped X-ray beam, in comparison to the parallel beam used in older CT generations. Distributed image reconstruction of cone-beam datasets typically relies on dividing batches of images into different nodes. This simple input decomposition, however, introduces limits on input/output sizes and scalability.
We propose a novel decomposition scheme and reconstruction algorithm for distributed FPB. This scheme enables arbitrarily large input/output sizes, eliminates the redundancy arising in the end-to-end pipeline and improves the scalability by replacing two communication collectives with only one segmented reduction. Finally, we implement the proposed decomposition scheme in a framework that is useful for all current-generation CT devices (7th gen). In our experiments using up to 1024 GPUs, our framework can construct 40963 volumes, for real-world datasets, in under 16 seconds (including I/O).