Skip to main content
SHARE
Research Highlight

Data Augmentation for Reinforcement Learning

Brief: The research team generated synthetic data that preserves both the feature distributions and the temporal dynamics in the original data.

Accomplishment: The research team generated synthetic data that can be used to train reinforcement-learning-based control systems to improve building energy efficiency.  The synthetic data was designed to preserve both the feature distributions and the temporal dynamics in the original data and to accelerate training.  Use of this synthetic data will enable model-free deep Reinforcement Learning (RL) to learn faster with less data and fewer interactions with the environment.  This will pave the way for utilizing and deploying RL for various domains at large-scale.  It is known that RL is a very powerful tool; however, RL utilization in various fields has been limited due to its need for a lot of training data.  RL has shown very good performance for control problems such as a building energy system, but its data hungry nature has been a barrier for its utilizations.  Considering the Department of Energy (DOE) mission for net-zero emission by 2050, having an adaptable and scalable control algorithm is crucial.

Description of work: Model-free RL is a powerful tool that has shown promising results in many domains such as robotics, game-playing, and building controls. Since model-free RL algorithms learn optimal control policies by continuously interacting with their environments, these algorithms require a lot of data to learn, which limits their application to a wide range of domains. In this quarter, a synthetic dataset was generated and evaluated using TimeGAN [1].  One of the use cases under study in this project involves optimizing the energy usage of a water heater using deep RL algorithm.  For this reason, the generated synthetic data involves a set of time series water heater data which shows the state of the water heater at each time step.

The team’s process to generate synthetic water heater dataset included five main steps.  First, a water heater simulator was developed and calibrated using the data collected from an actual water heater.  Second, an original water heater dataset was created using the simulator.  Third, embedding, recovery, supervisor, discriminator, and generator models were trained using the original dataset based on TimeGAN.  Fourth, a synthetic dataset initiated from a random noise were generated using the trained set of models.  Finally, the generated dataset was evaluated using discriminative and predictive scores (see data visualization figures).  These scores were calculated based on various dataset lengths.  It was noted that the length of the original dataset does not make any significant difference in terms of performance.  Thus, the data augmentation in an RL task can begin as soon as after a day of operation.  This indicated that the RL algorithm can use the synthetic data generated by RL after only 1 day of interaction with the environment.  For qualitative analysis, Principal Component Analysis (PCA) plot and t-distributed Stochastic Neighbor Embedding (t-SNE) plot of the original and synthetic data were used.

The result showed that the generated synthetic data was able to capture the properties of the original data. Specifically, the generated dataset mostly maintained the original relationships between variables across time.  For example, when the temperature is less than the setpoint, the heat pump (HP) was on, which increased the temperatures.  Also, the node temperatures (T1, T2, …, T6) were smaller in the lower nodes, which indicates the relationship between the features are preserved.  This is particularly important for RL applications as they learn via the Markov decision process (MDP). 

The Resulting Tuples table on the top, shows examples of the tuples generated vs original tuples. The performance table in the middle compares the performance across the original dataset with different lengths. The two figures on the bottom visualize the original and synthetic data distributions using PCA and t-SNE analysis. CCSD AI Initiative ORNL
The Resulting Tuples table on the top, shows examples of the tuples generated vs original tuples. The performance table in the middle compares the performance across the original dataset with different lengths. The two figures on the bottom visualize the original and synthetic data distributions using PCA and t-SNE analysis.

Acknowledgement: This research was funded by the AI Initiative, as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy (DOE).

Contact: Helia Zandi (zandih@ornl.gov)

Team: Helia Zandi, Kadir Amasyali, Kuldeep Kurte, Jeffrey Munk

References

[1] Yoon, J., Jarrett, D., & Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in Neural Information Processing Systems, 32.