Skip to main content
SHARE
Research Highlight

Generalizable Flow-based Policy Learning in Reinforcement Learning

Brief: Oak Ridge National Laboratory (ORNL) researchers have developed a novel, generalizable policy learning approach in deep reinforcement learning (RL) for future applications in additive manufacturing.

Accomplishment: We developed a multimodal policy distribution via normalizing flows, which offers a richer flexibility to capture arbitrarily complex policy distributions.  The propped flow-based Soft Actor Critic (F-SAC) significantly improves the policy exploration and generalization when compared with unimodal Gaussian policy in vanilla SAC.  FSAC is benchmarked on several examples (ex: soft-body manipulation tasks from PlasticineLab) and shows improvements over existing methods.

Background: Generalization is critical when learning policies to interact with real physical systems in RL.  Many real-world applications involving generalizable policy learning, such as, object manipulation and autonomous driving, motivate us to develop intelligent agents to learning generalizable and reliable policies.  SAC provides a solid off-policy algorithm within the maximum entropy RL framework. Although SAC offers greater stability, improved exploration and efficiency, the policy generalization and expression capability are severally limited by its Gaussian re-parametrization.  On the other hand, normalizing flows allow for effective learning of non-gaussian policies. 

(Left above) Normalizing flow framework for policy learning; (right above) Sampling method for latent random variable for variance reduction in normalizing flows; (below) flow-based SAC performance on soft-body manipulation tasks from PlasticineLab in reinforcement learning.
(Left above) Normalizing flow framework for policy learning; (right above) Sampling method for latent random variable for variance reduction in normalizing flows; (below) flow-based SAC performance on soft-body manipulation tasks from PlasticineLab in reinforcement learning.

Future Work: We seek to apply the FSAC method to the scan pattern optimization problem in additive manufacturing to further develop ORNL’s world-leading capabilities in this space.

Publication resulting from this work:   
Sirui Bi, Benjamin Stump, Alex Plotkowski, and Jiaxin Zhang. " Generalizable Flow-based Policy Learning in Soft-Body Manipulation."  ICML 2022 workshop (to be submitted).  

Acknowledgement: This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725.

Contact: Sirui Bi (bsi1@ornl.gov)

Team: Benjamin Stump, Sirui Bi, Jiaxin Zhang, and Alex Plotkowski