Reward Shaping And Curriculum Learning For Complex Locomotion Tasks

1

Outracing champion Gran Turismo drivers with deep reinforcement learning (Sophy)Product22/100

via “multi-agent reinforcement learning with curriculum learning for complex control tasks”

* ⭐ 02/2022: [Magnetic control of tokamak plasmas through deep reinforcement learning](https://www.nature.com/articles/s41586-021-04301-9%E2%80%A6)

Unique: Uses a carefully designed curriculum learning pipeline with progressive difficulty stages (single-agent time trials → multi-agent racing → championship scenarios) combined with distributed PPO training across GPU clusters, enabling agents to learn racing strategies that exceed human champion performance without explicit reward shaping for racing-specific behaviors

vs others: Outperforms imitation learning and hand-crafted reward functions by learning emergent racing strategies through self-play and curriculum progression, achieving superhuman lap times where supervised learning from human demonstrations plateaus

2

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning (ANYmal)Product21/100

* ⭐ 10/2022: [Discovering faster matrix multiplication algorithms with reinforcement learning (AlphaTensor)](https://www.nature.com/articles/s41586-022%20-05172-4)

Unique: Combines multi-component reward shaping with progressive curriculum learning, where task difficulty increases automatically as policy performance improves, enabling stable training toward complex locomotion objectives

vs others: Guides RL training toward natural, energy-efficient gaits by decomposing objectives into weighted reward components and progressively increasing difficulty, compared to sparse reward or single-objective approaches

3

Learning robust perceptive locomotion for quadrupedal robots in the wildProduct20/100

via “zero-shot task generalization through behavior cloning with latent embeddings”

* ⭐ 02/2022: [BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning](https://proceedings.mlr.press/v164/jang22a.html)

Unique: Uses a learned latent embedding space to decouple task representation from low-level motor control, enabling interpolation between behaviors without explicit task-specific training. The architecture learns a continuous task manifold where similar locomotion behaviors cluster, allowing the policy to generalize to unseen task combinations.

vs others: Achieves better generalization than single-task imitation learning and requires less task-specific data than multi-task reinforcement learning approaches, while maintaining real-world applicability through behavior cloning rather than simulation-based training.

Top Matches

Also Known As

Company