Skip to main content

A Distributed OpenCL Framework using Redundant Computation and Data Replication...

by Junghyun Kim, Jo Gangwon, Jung Jaehoon, Jungwon Kim, J. H. Lee
Publication Type
Conference Paper
Publication Date
Page Numbers
553 to 569
Conference Name
ACM SIGPLAN Conference on Programming Language Design and Implementation
Conference Location
Santa Barbara, California, United States of America
Conference Sponsor
Conference Date

Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.