CAFE AU LAIT: Compute-Aware Federated Augmented Low-Rank AI Training

by Jiayi Wang, John P Gounley, Heidi A Hanson

Publication Type

Conference Paper

Book Title

PASC '25: Proceedings of the Platform for Advanced Scientific Computing Conference

Publication Date

June, 2025

Page Numbers

1 to 12

Publisher Location

New York, New York, United States of America

Conference Name

Platform for Advanced Scientific Computing (PASC)

Conference Location

Brugg, Switzerland

Conference Sponsor

CSCS and ACM

Conference Date

Jun 16, 2025 - Jun 18, 2025

View DOI Listing

Abstract

Federated finetuning is crucial for unlocking the knowledge embedded in pretrained Large Language Models (LLMs) when data are geographically distributed across clients. Unlike finetuning with data from a single institution, federated finetuning allows collaboration across multiple institutions, enabling the utilization of diverse and decentralized datasets while preserving data privacy. Given the high computing costs of LLM training and the emphasis on energy efficiency in Federated Learning (FL), Low-Rank Adaptation (LoRA) has emerged as a widely adopted algorithm due to its significantly reduced number of trainable parameters. However, this assumes that all data silos have the necessary computing resources to compute local updates of LLMs. Nevertheless, in practice, the computing resources across clients are highly heterogeneous: while some may have access to hundreds of GPUs, others might have limited or no GPU access. Recently, federated finetuning using synthetic data has been proposed, allowing clients to participate in a collaborative training run without training LLMs locally. However, our experimental results reveal a performance gap between models trained using synthetic data and those trained using local updates. Motivated by the observed heterogeneity in computing resources and the performance gap, we propose a novel two-stage algorithm that leverages the storage and computing capabilities of a strong server. In the first stage, under the coordination of the strong server, clients with limited computing resources collaborate to generate synthetic data, which is transferred to and stored on the strong server. In the second stage, the strong server uses this synthetic data on behalf of the resource-constrained clients to perform federated LoRA finetuning alongside clients with sufficient computing resources. This approach ensures that all clients can participate in the finetuning process. Experimental results demonstrate that incorporating local updates from even a small fraction of clients improves performance compared to using synthetic data for all clients. Furthermore, we incorporate the Gaussian mechanism in both stages to guarantee client-level differential privacy.

CAFE AU LAIT: Compute-Aware Federated Augmented Low-Rank AI Training

Abstract

Researchers

Organizations