Learning Adaptive Diffusion Policies for Hybrid Dynamical Systems

Leroy D'Souza^†, Akash Karthikeyan^†, Yash Vardhan Pant, Sebastian Fischmeister

University of Waterloo

† Equal contribution

description Paper

Policies trained in Single Mode environments

Policies trained in single friction environments perform well on similar environments during evaluation. However, they fail when multiple friction coefficients are present in the same environment since they lack the ability to account for switches between modes.

Motivates training in environments with mode transitions.

Friction 1.0

Rollout on single-friction environments.

Result image 21 — Trajectory and Total Rewards Plot.

Result image 23 — Track Progress Rewards.

Friction 0.3

Rollout on single-friction (0.3) environments.

Result image 24 — Trajectory and Total Rewards Plot.

Result image 25 — Track Progress Rewards.

Hybrid Environment (Cyan 0.3/ Gray 1.0)

Hybrid evaluation: Cyan box ⇒ Friction = 0.5; elsewhere Friction = 1.0.

Demonstrating the importance of hybrid dynamics for legged robotic systems (Hopper with Contact Dynamics)

Friction = 1.0

Friction = 0.3

Hybrid environment: grey = 1.0, red = 0.3

Switching Policy Failure (Toy Example)

Toy example - Kinematic Bicycle Model — Kinematic Bicycle Model.

Toy example - Parametric Perturbation — Parametric Perturbation.

Perturbed Bicycle Model — Example MPC Outputs

Nominal vs. Perturbed

Nominal (mult. 0) vs Mult. 6 — The nominal model (mult. 0) routes differently from the perturbed model (mult. 6).

Reversal Behavior under Perturbation

MPC output A - reverse suggestion — Mult. 6 suggests reversing from the initial pose.

MPC output B - alternative rollouts — Alternative rollouts under perturbation.

MPC output C - additional rollout — Additional rollout supporting reversal for performance.

RL Switching Policy

Switching policy visualization — Policy that switches between parameter-specific experts.

Simulation Studies

Demonstrating the effect of training in Hybrid Environments

Hopper

Grey boxes – Friction = 1.0; Red boxes – Friction = 0.01.

SAC with ground-truth friction information.

Autonomous Driving

SAC trained with ground-truth hybrid surface map.

Zero-Shot Generalization SAC vs MoEDiff

Autonomous Driving on unseen track layout

Friction coefficient is 0.5 everywhere except the cyan boxes where it is 0.3.

Left: SAC baseline struggles with previously unseen racetrack layouts and changing surfaces.

Right: Diffusion agents handle unseen layouts and surface changes more robustly.

Hopper on unseen contact dynamics induced by terrain height variation.

Left: SAC baseline struggles with previously unseen contact dynamics.

Right: Diffusion agents handle unseen contact dynamics and is better able to stabilize the system.