Atom probe tomography (APT) is a material probing technique that has undergone dramatic improvements in its capability to map individual atoms within a material sample resulting in data files with hundreds of millions of atoms. Understanding the nano-structural features hidden in these massive amounts of atomic data is a crucial analysis task for materials scientists. However, fast analysis capabilities for large APT workloads remains a critical bottleneck. In this paper, we present the design, implementation and detailed performance evaluations of a parallel software capable of efficiently performing extremely time-consuming correlation analyses of massive high density APT data. Starting with shared memory implementations to motivate our design choices, we extend the implementation to hybrid architectures keeping realistic APT workloads in mind. Detailed performance analyses of three different parallel implementations of the software are supported by empirical results on a Cray XC30 and a Cray XC40 architecture. Its usefulness is demonstrated by reducing the turnaround time of an end-to-end APT correlation analysis on 100 millions atoms by three orders of magnitude using 2048 MPI ranks on 1024 nodes (24 cores per node) of a Cray XC30. The software reported here equips material scientists for the first time with a high-speed scalable capability for efficient and timely analyses of massive APT data.