Skip to main content
SHARE
Publication

Performance implications from sizing a VM on multi-core systems: A Data analytic application’s view...

by Seung-hwan Lim, James L Horey, Edmon Begoli, Yanjun Yao, Qing Cao
Publication Type
Conference Paper
Publication Date
Conference Name
High-Performance Grid and Cloud Computing Workshop (in conjunctino with IPDPS)
Conference Location
Boston, Massachusetts, United States of America
Conference Date

In this paper, we present a quantitative performance analysis of data analytics applications running on multi-core virtual machines. Such environments form the core of cloud computing. In addition, data analytics applications, such as Cassandra and Hadoop, are becoming increasingly popular
on cloud computing platforms. This convergence necessitates a better understanding of the performance and cost implications of such hybrid systems. For example, the very first step in
hosting applications in virtualized environments, requires the user to configure the number of virtual processors and the size of memory. To understand performance implications of this step, we benchmarked three Yahoo Cloud Serving Benchmark (YCSB) workloads in a virtualized multi-core environment. Our measurements indicate that the performance of Cassandra for YCSB workloads does not heavily depend on the processing capacity of a system, while the size of the data set is critical to performance relative to allocated memory. We also identified a strong relationship between the running time of workloads and various hardware events (last level cache loads, misses,
and CPU migrations). From this analysis, we provide several suggestions to improve the performance of data analytics applications running on cloud computing environments.