Skip to main content
SHARE
Publication

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks...

by Hui Guan, Laxmikant Kishor Mokadam, Xipeng Shen, Seung-hwan Lim, Robert M Patton
Publication Type
Conference Paper
Book Title
Proceedings of Machine Learning and Systems 2020 (MLSys 2020)
Publication Date
Conference Name
Third Conference on Machine Learning and Systems
Conference Location
Austin, Texas, United States of America
Conference Sponsor
Machine Learning and Systems
Conference Date

Parallel training of an ensemble of Deep Neural Networks (DNN) on a cluster of nodes is an effective approach to shorten the process of neural network architecture search and hyper-parameter tuning for a given learning task. Prior efforts have shown that data sharing, where the common preprocessing operation is shared across the DNN training pipelines, saves computational resources and improves pipeline efficiency. Data sharing strategy, however, performs poorly for a heterogeneous set of DNNs where each DNN has varying computational needs and thus different training rate and convergence speed. This paper proposes FLEET, a flexible ensemble DNN training framework for efficiently training a heterogeneous set of DNNs. We build FLEET via several technical innovations. We theoretically prove that an optimal resource allocation is NP-hard and propose a greedy algorithm to efficiently allocate resources for training each DNN with data sharing. We integrate data-parallel DNN training into ensemble training to mitigate the differences in training rates and introduce checkpointing into this context to address the issue of different convergence speeds. Experiments show that FLEET significantly improves the training efficiency of DNN ensembles without compromising the quality of the result.