|

Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction.

Researchers

Journal

Modalities

Models

Abstract

Deep reinforcement learning (RL) typically requires a tremendous number of training samples, which are not practical in many applications. State abstraction and world models are two promising approaches for improving sample efficiency in deep RL. However, both state abstraction and world models may degrade the learning performance. In this article, we propose an abstracted model-based policy learning (AMPL) algorithm, which improves the sample efficiency of deep RL. In AMPL, a novel state abstraction method via multistep bisimulation is first developed to learn task-related latent state spaces. Hence, the original Markov decision processes (MDPs) are compressed into abstracted MDPs. Then, a causal transformer model predictor (CTMP) is designed to approximate the abstracted MDPs and generate long-horizon simulated trajectories with a smaller multistep prediction error. Policies are efficiently learned through these trajectories within the abstracted MDPs via a modified multistep soft actor-critic algorithm with a λ -target. Moreover, theoretical analysis shows that the AMPL algorithm can improve sample efficiency during the training process. On Atari games and the DeepMind Control (DMControl) suite, AMPL surpasses current state-of-the-art deep RL algorithms in terms of sample efficiency. Furthermore, DMControl tasks with moving noises are conducted, and the results demonstrate that AMPL is robust to task-irrelevant observational distractors and significantly outperforms the existing approaches.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *