Physics Aware Policy Learning From High Dimensional Visual Observations

1

Mastering Diverse Domains through World Models (DreamerV3)Product24/100

via “multi-task visual policy learning with task-agnostic world models”

* ⏫ 02/2023: [Grounding Large Language Models in Interactive Environments with Online RL (GLAM)](https://arxiv.org/abs/2302.02662)

Unique: DreamerV3's task-agnostic world model learns shared visual representations without explicit task conditioning, relying on the policy learning objective to extract task-relevant information from the shared latent space. This contrasts with task-conditioned approaches (e.g., MTRL baselines) that explicitly encode task identity, making DreamerV3 more flexible for discovering emergent task structure.

vs others: Achieves better sample efficiency and generalization than task-conditioned baselines by learning task-invariant visual dynamics, while avoiding the computational overhead of task-specific world models or explicit task embeddings.

2

Outracing champion Gran Turismo drivers with deep reinforcement learning (Sophy)Product23/100

via “physics-aware policy learning from high-dimensional visual observations”

* ⭐ 02/2022: [Magnetic control of tokamak plasmas through deep reinforcement learning](https://www.nature.com/articles/s41586-021-04301-9%E2%80%A6)

Unique: Trains end-to-end CNN policies directly on high-resolution camera images by leveraging Gran Turismo's differentiable physics engine, enabling gradient-based optimization of visual perception and control jointly rather than using separate perception and planning modules

vs others: Achieves better sample efficiency and generalization than modular approaches (separate perception + planning) because the visual features are optimized directly for control relevance rather than generic object detection

3

Learning robust perceptive locomotion for quadrupedal robots in the wildProduct21/100

via “vision-based locomotion policy learning from real-world robot trajectories”

* ⭐ 02/2022: [BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning](https://proceedings.mlr.press/v164/jang22a.html)

Unique: Directly trains end-to-end visuomotor policies on real-world robot trajectories without simulation, using robust data augmentation and domain randomization techniques to handle the distribution shift between training and deployment environments. The approach captures implicit terrain understanding through visual features rather than explicit terrain classification.

vs others: Outperforms pure simulation-based approaches by training on real sensor data and terrain interactions, and exceeds hand-crafted controllers by learning adaptive behaviors from diverse demonstrations without manual parameter tuning.

Top Matches

Also Known As

Company