Policies trained in single friction environments perform well on similar environments during evaluation. However, they fail when multiple friction coefficients are present in the same environment since they lack the ability to account for switches between modes.
Motivates training in environments with mode transitions.
Friction 1.0
Rollout on single-friction environments.Trajectory and Total Rewards Plot.Track Progress Rewards.
Friction 0.3
Rollout on single-friction (0.3) environments.Trajectory and Total Rewards Plot.Track Progress Rewards.
Friction coefficient is 0.5 everywhere except the cyan boxes where it is 0.3.
Left: SAC baseline struggles with previously unseen racetrack layouts and changing surfaces.Right: Diffusion agents handle unseen layouts and surface changes more robustly.
Hopper on unseen contact dynamics induced by terrain height variation.
Left: SAC baseline struggles with previously unseen contact dynamics.Right: Diffusion agents handle unseen contact dynamics and is better able to stabilize the system.