Outracing champion Gran Turismo drivers with deep reinforcement learning (Sophy)Product24/100 via “reward function design and shaping for complex multi-objective tasks”
* ⭐ 02/2022: [Magnetic control of tokamak plasmas through deep reinforcement learning](https://www.nature.com/articles/s41586-021-04301-9%E2%80%A6)
Unique: Combines potential-based reward shaping with multi-objective weighting to balance lap time, safety, and race position, using domain knowledge about racing physics to structure rewards that guide learning without over-constraining agent behavior or creating conflicting gradient signals
vs others: Achieves better policy robustness than single-objective rewards (lap time only) by explicitly balancing safety and race performance, and better sample efficiency than inverse RL approaches by leveraging domain knowledge to structure rewards directly