[LR] TD-MPC2

[This review is intended solely for my personal learning] Paper Info Title: TD-MPC2: Scalable, Robust World Models for Continuous Control Authors: Nicklas Hansen, Hao Su, Xiaolong Wang Conference: ICLR 2024 arXiv: 2310.16828 Prior Knowledge Model-Based Reinforcement Learning (MBRL): Uses an internal model of the environment to plan and optimize actions rather than learning policies from direct interaction alone. Temporal Difference Learning: A method in RL that estimates the value function iteratively using bootstrapped learning. Model Predictive Control (MPC): An optimization framework for selecting actions over a finite horizon using a learned world model. TD-MPC: A prior algorithm that performs local trajectory optimization in the latent space of an implicit world model but lacks scalability and robustness. Goal The authors propose TD-MPC2, an extension of the TD-MPC framework, designed to scale reinforcement learning to large, uncurated datasets and generalize across multiple continuous control tasks. The key aims are: ...

November 21, 2024 · 4 min