Learning Adaptive Diffusion Policies for Hybrid Dynamical Systems

University of Waterloo
† Equal contribution

Policies trained in Single Mode environments

Policies trained in single friction environments perform well on similar environments during evaluation. However, they fail when multiple friction coefficients are present in the same environment since they lack the ability to account for switches between modes.

Motivates training in environments with mode transitions.

Friction 1.0

Rollout on single-friction environments.
Result image 21
Trajectory and Total Rewards Plot.
Result image 23
Track Progress Rewards.

Friction 0.3

Rollout on single-friction (0.3) environments.
Result image 24
Trajectory and Total Rewards Plot.
Result image 25
Track Progress Rewards.

Hybrid Environment (Cyan 0.3/ Gray 1.0)

Hybrid evaluation: Cyan box ⇒ Friction = 0.5; elsewhere Friction = 1.0.

Demonstrating the importance of hybrid dynamics for legged robotic systems (Hopper with Contact Dynamics)

Friction = 1.0
Friction = 0.3
Hybrid environment: grey = 1.0, red = 0.3

Switching Policy Failure (Toy Example)

Toy example - Kinematic Bicycle Model
Kinematic Bicycle Model.
Toy example - Parametric Perturbation
Parametric Perturbation.

Perturbed Bicycle Model — Example MPC Outputs

Nominal vs. Perturbed

Nominal (mult. 0) vs Mult. 6
The nominal model (mult. 0) routes differently from the perturbed model (mult. 6).

Reversal Behavior under Perturbation

MPC output A - reverse suggestion
Mult. 6 suggests reversing from the initial pose.
MPC output B - alternative rollouts
Alternative rollouts under perturbation.
MPC output C - additional rollout
Additional rollout supporting reversal for performance.

RL Switching Policy

Switching policy visualization
Policy that switches between parameter-specific experts.

Simulation Studies

Demonstrating the effect of training in Hybrid Environments

Hopper

Grey boxes – Friction = 1.0; Red boxes – Friction = 0.01.
SAC with ground-truth friction information.

Autonomous Driving

SAC trained with ground-truth hybrid surface map.

Zero-Shot Generalization SAC vs MoEDiff

Autonomous Driving on unseen track layout

Friction coefficient is 0.5 everywhere except the cyan boxes where it is 0.3.
Left: SAC baseline struggles with previously unseen racetrack layouts and changing surfaces.
Right: Diffusion agents handle unseen layouts and surface changes more robustly.

Hopper on unseen contact dynamics induced by terrain height variation.

Left: SAC baseline struggles with previously unseen contact dynamics.
Right: Diffusion agents handle unseen contact dynamics and is better able to stabilize the system.