Skip to main content
Publication

yProv4ML: Effortless provenance tracking for machine learning systems...

by Gabriele Padovani, Valentine G Anantharaj, Sandro Fiore
Publication Type
Journal
Journal Name
SoftwareX
Publication Date
Page Number
102298
Volume
31

The rapid growth in interest in deep learning and foundation models (FMs) in particular, has attracted the attention of a diverse range of researchers thanks to their generalization ability. However, the advent of these techniques has also brought to light the lack of transparency and rigor in the way development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework that captures provenance information generated during machine learning processes in PROV-JSON format, with minimal code modification.