Phase 3 – Replay Store & One-Click Replays
Phase 3 hardens FleetForge's replay surface so operators can reproduce any run, compare new prompts against recorded behaviour, and inspect immutable step attempt history.
What landed
- Immutable step attempts – every step completion now appends a row to the
step_attemptstable. Attempts capture the exactStepSpec, inputs, budget snapshot, guardrail events, tool emissions, artifacts, and recorded output for the attempt. Snapshots are append-only and keyed by(run_id, step_id, attempt). - Run event log – outbox events are mirrored to
run_event_log, providing an audit trail that can be replayed or exported without draining the outbox. - Replay API enhancements –
/ReplayRunnow consumes the immutable attempt store.mode=fullreturns the latest attempt snapshots and verifies the persisted output against the current step row.mode=shadowcompares the two latest attempts (if present) so prompt/tool changes surface bit-for-bit deltas. - UI/SDK wiring – the TypeScript SDK maps
shadowrequests to the new diff replay mode. The UI reuses the existing replay button and displays drift deltas when a replay falls outside tolerance.
Storage additions
Migration 0013_replay_store.sql introduces:
step_attempts– append-only attempt metadata (see above).run_event_log– immutable mirror of the outbox stream.
Run sqlx migrate run (or just db-migrate) after pulling these changes.
Operational notes
- Replay responses now embed attempt metadata alongside the recorded/current
payload. Consumers should read
drift.recorded/drift.currentfor the annotated output and deltas. - Shadow replays (
mode=shadow) require at least two attempts to surface deltas; otherwise the response falls back to the current snapshot with zero drift. - The underlying gRPC strategy remains
mockedfor both modes; live replays can slot into the new attempt store in a future phase.