Abstract
Model-based reinforcement learning is a widely accepted solution for solving
excessive sample demands. However, the predictions of the dynamics models are
often not accurate enough, and the resulting bias may incur catastrophic
decisions due to insufficient robustness. Therefore, it is highly desired to
investigate how to improve the robustness of model-based RL algorithms while
maintaining high sampling efficiency. In this paper, we propose Model-Based
Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP
consists of two kinds of dropout mechanisms, where the rollout-dropout aims to
improve the robustness with a small cost of sample efficiency, while the
model-dropout is designed to compensate for the lost efficiency at a slight
expense of robustness. By combining them in a complementary way, MBDP provides
a flexible control mechanism to meet different demands of robustness and
efficiency by tuning two corresponding dropout ratios. The effectiveness of
MBDP is demonstrated both theoretically and experimentally.