Skip to main content

Phase 3 – Replay Store & One-Click Replays

Phase 3 hardens FleetForge's replay surface so operators can reproduce any run, compare new prompts against recorded behaviour, and inspect immutable step attempt history.

What landed

  • Immutable step attempts – every step completion now appends a row to the step_attempts table. Attempts capture the exact StepSpec, inputs, budget snapshot, guardrail events, tool emissions, artifacts, and recorded output for the attempt. Snapshots are append-only and keyed by (run_id, step_id, attempt).
  • Run event log – outbox events are mirrored to run_event_log, providing an audit trail that can be replayed or exported without draining the outbox.
  • Replay API enhancements/ReplayRun now consumes the immutable attempt store. mode=full returns the latest attempt snapshots and verifies the persisted output against the current step row. mode=shadow compares the two latest attempts (if present) so prompt/tool changes surface bit-for-bit deltas.
  • UI/SDK wiring – the TypeScript SDK maps shadow requests to the new diff replay mode. The UI reuses the existing replay button and displays drift deltas when a replay falls outside tolerance.

Storage additions

Migration 0013_replay_store.sql introduces:

  • step_attempts – append-only attempt metadata (see above).
  • run_event_log – immutable mirror of the outbox stream.

Run sqlx migrate run (or just db-migrate) after pulling these changes.

Operational notes

  • Replay responses now embed attempt metadata alongside the recorded/current payload. Consumers should read drift.recorded / drift.current for the annotated output and deltas.
  • Shadow replays (mode=shadow) require at least two attempts to surface deltas; otherwise the response falls back to the current snapshot with zero drift.
  • The underlying gRPC strategy remains mocked for both modes; live replays can slot into the new attempt store in a future phase.