Skip to main content

Phase 1 – Deterministic Runtime

Phase 1 extends FleetForge's execution core so operators can treat agent runs like deterministic, observable jobs rather than best-effort tasks. The key additions are:

  • Execution controls – every step may now include an execution block in the step spec with runtime semantics:
    • max_attempts counts deterministic retries before the scheduler marks a step failed.
    • retry_backoff_ms + retry_backoff_factor drive exponential back-off; retries are parked by setting not_before on the step row.
    • deadline_ms lets the scheduler fail a step if it cannot start before a deadline.
    • compensation.step references a secondary step to queue automatically when the primary step reaches a terminal failure.
    • cost_hint.{tokens,usd} seed budget reservations and the new back-pressure gates.
  • Back-pressureSchedulerConfig accepts max_inflight_tokens / max_inflight_cost caps. When new work would exceed a cap the scheduler defers the step with a step_deferred outbox event instead of over-committing workers.
  • Checkpoint/restoreStepExecutionResult::with_checkpoint persists arbitrary JSON snapshots. Steps expose the last checkpoint via QueuedStep.checkpoint, and the execution context receives it through StepCtx.
  • Cost ledger – every attempt (success, failure, retry scheduling, compensation) is recorded in step_cost_ledger. The ledger captures reserved vs. actual token and cost usage for downstream billing or auditing pipelines.
  • Compensation scheduling – when a step defines a compensation target the scheduler flips the target to queued with pending_dependencies = 0 after the final failure and marks compensation_scheduled = true on the origin step.

See the Step schema (source api/schemas/step.json) for the extended specification and core/storage/migrations/0012_execution_controls.sql for the backing storage changes.