Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics

Publication Type

Conference Paper

Book Title

Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation

Publication Date

March, 2022

Page Numbers

119 to 138

Volume

1512

Publisher Location

Cham, Switzerland

Conference Name

Smoky Mountains Computational Science and Engineering Conference (SMC)

Conference Location

Kingsport, Tennessee, United States of America

Conference Sponsor

UT-Battelle and DOE

Conference Date

Oct 18, 2021 - Oct 20, 2021

View DOI Listing

Abstract

The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.

Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics

Abstract

Researchers

Organizations