Skip to main content

Using a Genetic Algorithm to Optimize Configurations in a Data-Driven Application...

by Urjoshi Sinha, Mikaela C Mcdevitt, Myra Cohen
Publication Type
Conference Paper
Journal Name
12th Symposium on Search-Based Software Engineering
Book Title
SSBSE: International Symposium on Search Based Engineering
Publication Date
Page Numbers
137 to 152
Publisher Location
Conference Name
12th Symposium on Search-Based Software Engineering (SSBSE)
Conference Location
Bari, Italy
Conference Sponsor
Conference Date

Users of highly-configurable software systems often want to optimize a particular objective such as improving a functional outcome or increasing system performance. One approach is to use an evolutionary algorithm. However, many applications today are data-driven, meaning they depend on inputs or data which can be complex and varied. Hence, a search needs to be run (and re-run) for all inputs, making optimization a heavy-weight and potentially impractical process. In this paper, we explore this issue on a data-driven highly-configurable scientific application. We build an exhaustive database containing 3,000 configurations and 10,000 inputs, leading to almost 100 million records as our oracle, and then run a genetic algorithm individually on each of the 10,000 inputs. We ask if (1) a genetic algorithm can find configurations to improve functional objectives; (2) whether patterns of best configurations over all input data emerge; and (3) if we can we use sampling to approximate the results. We find that the original (default) configuration is best only 34% of the time, while clear patterns emerge of other best configurations. Out of 3,000 possible configurations, only 112 distinct configurations achieve the optimal result at least once across all 10,000 inputs, suggesting the potential for lighter weight optimization approaches. We show that sampling of the input data finds similar patterns at a lower cost.