The peculiar nature of whole slide imaging (WSI), digitizing conventional glass slides to obtain multiple high resolution images which capture microscopic details of a patient’s histopathological features, has garnered increased interest from the computer vision research community over the last two decades. Given the unique computational space and time complexity inherent to gigapixel-size whole slide image data, researchers have proposed novel machine learning algorithms to aid in the performance of diagnostic tasks in clinical pathology. One effective algorithm represents a Whole slide image as a bag of smaller image patches, which can be represented as low-dimension image patch embeddings. Weakly supervised deep-learning methods, such as cluster-constrained-attention multiple instance learning (CLAM), have shown promising results when combined with image patch embeddings. While traditional ensemble classifiers yield improved task performance, such methods come with a steep cost in model complexity. Through knowledge distillation, it is possible to retain some performance improvements from an ensemble, while minimizing costs to model complexity. In this work, we implement a weakly supervised ensemble using clustering-constrained-attention multiple-instance learners (CLAM), which uses attention and instance-level clustering to identify task salient regions and feature extraction in whole slides. By applying logit-based and attention-based knowledge distillation, we show it is possible to retain some performance improvements resulting from the ensemble at zero cost to model complexity.