Skip to main content

Performance Evaluation of Python Based Data Analytics Frameworks in Summit: Early Experiences...

Publication Type
Book Chapter
Publication Date
Page Numbers
366 to 380
Publisher Name
Springer International Publishing
Publisher Location
Cham, Switzerland

The explosion in the volumes of data generated from ever-larger simulation campaigns and experiments or observations necessitates competent tools for data wrangling and analysis). While the Oak Ridge Leadership Computing Facility (OLCF) provides a variety of tools to perform data wrangling and data analysis tasks, Python based tools often lack scalability, or the ability to fully exploit the computational capability of OLCF’s Summit supercomputer. NVIDIA RAPIDS and Dask offer a promising solution to accelerate and distribute data analytics workloads from personal computers to heterogeneous supercomputing systems. We discuss early performance evaluation results of RAPIDS and Dask on Summit to understand their capabilities, scalability, and limitations. Our evaluation includes a subset of RAPIDS libraries, i.e., cuDF, cuML, and cuGraph, and Chainer’s CuPy, and their multi-GPU variants when available.We also draw on the observed trends from the performance evaluation results to discuss best practices for maximizing performance.