Skip to main content
SHARE
Publication

CAFE AU LAIT: Compute-Aware Federated Augmented Low-Rank AI Training

by Jiayi Wang, John P Gounley, Heidi A Hanson
Publication Type
Conference Paper
Book Title
PASC '25: Proceedings of the Platform for Advanced Scientific Computing Conference
Publication Date
Page Numbers
1 to 12
Publisher Location
New York, New York, United States of America
Conference Name
Platform for Advanced Scientific Computing (PASC)
Conference Location
Brugg, Switzerland
Conference Sponsor
CSCS and ACM
Conference Date
-

Federated finetuning is crucial for unlocking the knowledge embedded in pretrained Large Language Models (LLMs) when data are geographically distributed across clients. Unlike finetuning with data from a single institution, federated finetuning allows collaboration across multiple institutions, enabling the utilization of diverse and decentralized datasets while preserving data privacy. Given the high computing costs of LLM training and the emphasis on energy efficiency in Federated Learning (FL), Low-Rank Adaptation (LoRA) has emerged as a widely adopted algorithm due to its significantly reduced number of trainable parameters. However, this assumes that all data silos have the necessary computing resources to compute local updates of LLMs. Nevertheless, in practice, the computing resources across clients are highly heterogeneous: while some may have access to hundreds of GPUs, others might have limited or no GPU access. Recently, federated finetuning using synthetic data has been proposed, allowing clients to participate in a collaborative training run without training LLMs locally. However, our experimental results reveal a performance gap between models trained using synthetic data and those trained using local updates. Motivated by the observed heterogeneity in computing resources and the performance gap, we propose a novel two-stage algorithm that leverages the storage and computing capabilities of a strong server. In the first stage, under the coordination of the strong server, clients with limited computing resources collaborate to generate synthetic data, which is transferred to and stored on the strong server. In the second stage, the strong server uses this synthetic data on behalf of the resource-constrained clients to perform federated LoRA finetuning alongside clients with sufficient computing resources. This approach ensures that all clients can participate in the finetuning process. Experimental results demonstrate that incorporating local updates from even a small fraction of clients improves performance compared to using synthetic data for all clients. Furthermore, we incorporate the Gaussian mechanism in both stages to guarantee client-level differential privacy.