Phase 1 – Deterministic Runtime
Phase 1 extends FleetForge's execution core so operators can treat agent runs like deterministic, observable jobs rather than best-effort tasks. The key additions are:
- Execution controls – every step may now include an
executionblock in the step spec with runtime semantics:max_attemptscounts deterministic retries before the scheduler marks a step failed.retry_backoff_ms+retry_backoff_factordrive exponential back-off; retries are parked by settingnot_beforeon the step row.deadline_mslets the scheduler fail a step if it cannot start before a deadline.compensation.stepreferences a secondary step to queue automatically when the primary step reaches a terminal failure.cost_hint.{tokens,usd}seed budget reservations and the new back-pressure gates.
- Back-pressure –
SchedulerConfigacceptsmax_inflight_tokens/max_inflight_costcaps. When new work would exceed a cap the scheduler defers the step with astep_deferredoutbox event instead of over-committing workers. - Checkpoint/restore –
StepExecutionResult::with_checkpointpersists arbitrary JSON snapshots. Steps expose the last checkpoint viaQueuedStep.checkpoint, and the execution context receives it throughStepCtx. - Cost ledger – every attempt (success, failure, retry scheduling, compensation) is
recorded in
step_cost_ledger. The ledger captures reserved vs. actual token and cost usage for downstream billing or auditing pipelines. - Compensation scheduling – when a step defines a compensation target the scheduler
flips the target to
queuedwithpending_dependencies = 0after the final failure and markscompensation_scheduled = trueon the origin step.
See the Step schema (source api/schemas/step.json) for the extended specification and
core/storage/migrations/0012_execution_controls.sql for the backing storage changes.