Skip to main content
SHARE
Publication

A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data...

by Wei-chen Chen, George Ostrouchov, David R Pugmire, Prabhat, Michael Wehner
Publication Type
Journal
Journal Name
Technometrics
Publication Date
Page Numbers
513 to 523
Volume
55
Issue
4

We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to
perform model-based clustering of a large climate data set. Three variants of the EM algorithm
are reformulated in parallel and a new variant that is faster is presented. All are implemented
using the single program, multiple data (SPMD) programming model, which is able to take
advantage of the combined collective memory of large distributed computer architectures to
process larger data sets. Displays of the estimated mixture model rather than the data allow
us to explore multivariate relationships in a way that scales to arbitrary size data. We study
the performance of our methodology on simulated data and apply our methodology to a high
resolution climate dataset produced by the community atmosphere model (CAM5). This article
has supplementary material online.