In this paper we describe and demonstrate a discrete ordinates sweep algorithm on GPUs. This sweep algorithm is nested within a multilevel comunication-based decomposition based on energy. We demonstrated the effectiveness of this algorithm on detailed three-dimensional critical experiments and PWR lattice problems. For these problems we show improvement factors of 4–6 over conventional communication-based, CPU-only sweeps. These sweep kernel speedups resulted in a factor of 2 total time-to-solution improvement.