DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious-Play

University of Waterloo

🎥 Supplementary Videos

MPE — Adversary Model Stochasticity

MPE - Adversary: We fix the seed and evaluate thrice to show inherent model stochasticity. DiffFSP learns more diverse strategies.

MPE — Tag Predator–Prey

Qualitative predator–prey results under sparse rewards show competitive gameplay.

RaceTrack — Robustness to Unseen Opponents

Trained agents (yellow) vs unseen agents (blue). Agents learn to set up overtakes through cornering; several exhibit block pass behavior.

RaceTrack — Robustness: Failure Modes

Left: QSMFSP fails a lane change, rear-ending the opponent (local observations only). Right: DiffFSP infers agents ahead and briefly violates track boundaries to overtake.

RaceTrack — Overtake (1)

The attacker overtakes at a turn and immediately blocks to prevent re-overtake.

RaceTrack — Overtake (2)

Another strategic overtake on a curve followed by a blocking maneuver.

RaceTrack — Overtake (1v1)

1v1 overtake at a curve with immediate defensive positioning.

RaceTrack — Block (1)

The defender executes sustained blocking to prevent an overtake.

RaceTrack — Defensive Driving (1)

The attacker maintains distance and speed match; occasional shoulder checks without overtake attempts.

RaceTrack — Overtake Fail (1)

The attacker aborts an overtake, braking late to avoid a rear-end collision.

RaceTrack — Brake Check (Follower)

The defender performs a brake check; the attacker reacts defensively to avoid collision.

description Read the Paper