The GPU solver within the Shift continuous-energy Monte Carlo neutron transport code has been extended to provide domain decomposition in addition to domain replication to enable the solution of problems with memory requirements exceeding the capacity of a single GPU. The strategy follows the Multiple Set, Overlapping Domain (MSOD) approach that is used in Shift’s CPU solver and integrates into the event-based algorithm used for Shift’s GPU solver. Furthermore, the ability to assign processors to spatial domains non-uniformly has been maintained. Two different approaches for communicating particle data between domains are considered, and multiple criteria for load balancing problems have been investigated. Numerical results are presented for both fresh and depleted small modular nuclear reactor (SMR) cores. A parallel efficiency of approximately 80% was achieved with up to 16 spatial domains measured relative to full domain replication. A scaling study on the Summit supercomputer demonstrates a weak scaling parallel efficiency of over 90% on over 24000 GPUs.