Mastering Diverse Domains through World Models (DreamerV3)Product26/100 via “world-model-based reinforcement learning with latent imagination”
* ⏫ 02/2023: [Grounding Large Language Models in Interactive Environments with Online RL (GLAM)](https://arxiv.org/abs/2302.02662)
Unique: DreamerV3 uses a unified latent-space representation for both world modeling and policy learning, with a novel scaling approach (symlog) that handles rewards across 10+ orders of magnitude without task-specific normalization. Unlike prior world-model methods (PlaNet, Dreamer v1/v2), it achieves strong performance on both visual control and Atari without architectural changes, through improved training stability and a unified loss function that balances reconstruction, dynamics, and policy objectives.
vs others: Outperforms model-free methods (PPO, SAC) on sample efficiency by 10-100x and matches or exceeds model-based alternatives (MBPO, SLAC) while requiring no task-specific reward normalization or domain adaptation, making it more practical for diverse visual domains.