DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious-Play

University of Waterloo

🎥 Supplementary Videos

MPE — Adversary Model Stochasticity

MPE Adversary episode 1
MPE Adversary episode 2
MPE Adversary episode 3
MPE - Adversary: We fix the seed and evaluate thrice to show inherent model stochasticity. DiffFSP learns more diverse strategies.

MPE — Tag Predator–Prey

MPE Tag 105
MPE Tag 106
MPE Tag 108
Qualitative predator–prey results under sparse rewards show competitive gameplay.

RaceTrack — Robustness to Unseen Opponents

Block pass 1
Block pass 2
Block pass 3
Trained agents (yellow) vs unseen agents (blue). Agents learn to set up overtakes through cornering; several exhibit block pass behavior.

RaceTrack — Robustness: Failure Modes

QSMFSP failure
DiffFSP rule violation
Left: QSMFSP fails a lane change, rear-ending the opponent (local observations only). Right: DiffFSP infers agents ahead and briefly violates track boundaries to overtake.

RaceTrack — Overtake (1)

Overtake 1
The attacker overtakes at a turn and immediately blocks to prevent re-overtake.

RaceTrack — Overtake (2)

Overtake 2
Another strategic overtake on a curve followed by a blocking maneuver.

RaceTrack — Overtake (1v1)

1v1 Overtake
1v1 overtake at a curve with immediate defensive positioning.

RaceTrack — Block (1)

Blocking
The defender executes sustained blocking to prevent an overtake.

RaceTrack — Defensive Driving (1)

Defensive driving
The attacker maintains distance and speed match; occasional shoulder checks without overtake attempts.

RaceTrack — Overtake Fail (1)

Overtake fail rear end
The attacker aborts an overtake, braking late to avoid a rear-end collision.

RaceTrack — Brake Check (Follower)

Brake check follower
The defender performs a brake check; the attacker reacts defensively to avoid collision.
description Read the Paper