MERL: Multi-Head Reinforcement Learning


A common challenge in reinforcement learning is how to efficiently sample an environment to convert the agent’s interactions into fast and robust learning, leading to high performance in complex tasks. For instance, earlier work makes use of domain/prior knowledge to improve existing reinforcement learning algorithms. While promising, previously acquired knowledge is often costly and challenging to scale up. Instead, we decide to consider the use of problem knowledge, which constitutes signals from any relevant quantity useful to solve many tasks, e.g., self-performance assessment and accurate expectations. We propose MERL, a general framework for structuring reinforcement learning by injecting problem knowledge into policy gradient updates. Unlike other auxiliary tasks methods, MERL is generally applicable to any task. As a result, policy and value functions are no longer only optimized for a reward but are learned using task-agnostic quantities. In this paper: (a) We introduce and define MERL, our new multi-head reinforcement learning framework. (b) We conduct experiments across a variety of standard benchmark environments, including 9 continuous control tasks where results show improved performance. (c) We demonstrate that MERL also improves transfer learning on a set of challenging tasks. (d) We investigate how our approach tackles the problem of reward sparsity and better condition the feature space in the context of deep reinforcement learning agents.

Deep Reinforcement Learning Workshop (NeurIPS)