FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks

by Hui Guan, Laxmikant Kishor Mokadam, Xipeng Shen, Seung-hwan Lim, Robert M Patton

Publication Type

Conference Paper

Book Title

Proceedings of Machine Learning and Systems 2020 (MLSys 2020)

Publication Date

March, 2020

Conference Name

Third Conference on Machine Learning and Systems

Conference Location

Austin, Texas, United States of America

Conference Sponsor

Machine Learning and Systems

Conference Date

Mar 2, 2020

Abstract

Parallel training of an ensemble of Deep Neural Networks (DNN) on a cluster of nodes is an effective approach to shorten the process of neural network architecture search and hyper-parameter tuning for a given learning task. Prior efforts have shown that data sharing, where the common preprocessing operation is shared across the DNN training pipelines, saves computational resources and improves pipeline efficiency. Data sharing strategy, however, performs poorly for a heterogeneous set of DNNs where each DNN has varying computational needs and thus different training rate and convergence speed. This paper proposes FLEET, a flexible ensemble DNN training framework for efficiently training a heterogeneous set of DNNs. We build FLEET via several technical innovations. We theoretically prove that an optimal resource allocation is NP-hard and propose a greedy algorithm to efficiently allocate resources for training each DNN with data sharing. We integrate data-parallel DNN training into ensemble training to mitigate the differences in training rates and introduce checkpointing into this context to address the issue of different convergence speeds. Experiments show that FLEET significantly improves the training efficiency of DNN ensembles without compromising the quality of the result.

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks

Abstract

Researchers

Organizations