Skip to main content

Policy System Overview

Policies let you describe what’s allowed in a workflow—budget, tools, data, and network paths—and show proof when something is denied or allowed later.

FleetForge enforces policy at multiple layers so every agent run stays within approved safety, cost, and compliance boundaries. This page explains how the policy engine works and how to reason about guardrails in production.

Layers

  1. Step guardrails (policy.guardrails[]): Per-step flags that enable prompt injection filtering, PII controls, sandbox hints, and HTTP allowlists.
  2. Policy packs (FLEETFORGE_POLICY_PACK): Runtime-wide bundles (HIPAA, GDPR, OWASP demo, allow_all) that configure default sandboxes, tool/image allowlists, and regulated behaviours.
  3. Budget policies: Caps enforced by the scheduler that fail steps when reserved or actual spend crosses configured thresholds.
  4. ChangeOps gates: Release checks that use telemetry and eval results to block un-reviewed changes before shipping (see ChangeOps concept).

Budget & SLO contracts

Every step spec may declare both a spend ceiling and an SLO tier inside the policy block. The budget values gate how many tokens/cost units the scheduler is willing to reserve up front, while the SLO metadata drives queue ordering and telemetry.

"policy": {
"budget": {
"tokens": 200000,
"cost": 3.00
},
"slo": {
"tier": "gold",
"queue_target_ms": 5000,
"priority_boost": 2,
"error_budget_ratio": 0.20
}
}
  • budget.tokens / budget.cost – maximum reservation for the step. When set, the scheduler calls Storage::try_consume_budget before execution and fails the step with kind=budget_exceeded if the run has already exhausted its quota. Absent values fall back to execution.cost_hint or the adapter inputs (token_estimate, cost_estimate).
  • slo.queue_target_ms – desired queue latency. The scheduler continuously tracks observed_queue_ms, computes slack (target - wait), and prioritises breached contracts ahead of best-effort work.
  • slo.priority_boost – additional priority offset applied during ranking so tiers can jump the queue even before breaching.
  • slo.tier / slo.error_budget_ratio – descriptive metadata captured in telemetry and outbox events so TrustOps scorecards can group breaches per contract.

The runtime emits fleetforge.slo.queue_slack_ms and fleetforge.slo.preemptions metrics whenever a contracted step is evaluated, so dashboards can visualise queue health and throttling behaviour. Budget snapshots and SLO state are also attached to every step_attempt row and run event payload, feeding ChangeOps’ “budget scorecard”.

Runtime knobs

  • FLEETFORGE_MAX_INFLIGHT_TOKENS – caps the total reserved tokens across in-flight steps; the scheduler defers additional work once the limit is hit.
  • FLEETFORGE_MAX_INFLIGHT_COST – identical control for USD cost reservations.
  • FLEETFORGE_QUEUE_BACKPRESSURE_MS – overrides the default 50 ms delay that backpressure deferrals add to a step’s not_before timestamp.

Leaving these variables unset keeps the previous behaviour (no inflight cap and 50 ms defer window). All three knobs are read inside core/runtime/src/scheduler.rs so Helm charts and the toolbox can set them per environment.

Architecture

  • core/policy/ – Policy engine, guardrail evaluators, and policy pack implementations.
  • core/runtime/src/guardrails.rs – Runtime integration that applies guardrail effects (deny, redact, modify) before and after executor calls.
  • policy-packs/ – Source of truth for pack definitions (hipaa, gdpr, owasp_demo, etc.).
  • core/policy/packs/prompt_injection/ – OPA/Rego source compiled to WebAssembly for prompt-injection detection.

Telemetry & dashboards

  • Runtime spans expose the canonical trust.* OpenTelemetry attributes (trust.attestation_ids, trust.attestation_id, trust.subject, trust.policy_decision_id, trust.capability_token_id) defined in core/telemetry::TRUST_ATTRIBUTE_KEYS. Use these keys whenever you need to correlate a policy event with attestation evidence or a capability token.
  • The ClickHouse Grafana dashboard (deploy/otel/grafana-clickhouse-traces.json) now ships with a Policy Trust Chain panel that lets you filter spans by trust.attestation_id and the recorded policy effect so on-call operators can pivot directly from a failing run to its signed policy decisions.

Operator Checklist

  • Pick the tightest policy pack that meets your data requirements. Extend tool/image/network allowlists only after ChangeOps or ticket approvals.
  • Use fleetforge-ctl gates to require eval coverage before promoting prompt or tool changes.
  • Review policy_decisions artifacts in the UI or object store to understand why a run was denied or redacted.
  • Keep demos air-gapped: follow the demo hardening how-to when exposing environments publicly.

Pluggable policy engine & SDK

The Wasm guardrail runtime stays the default execution substrate, but customers can now run Open Policy Agent or Cedar policies beside those packs:

  • OPA bundles: Drop Rego bundles into any policy pack (policy-packs/<pack>/opa/). The runtime loads the bundle via opa::wasm, routes step inputs into the specified entrypoint, and records the verdict alongside the Wasm decision.
  • Cedar policies: Cedar JSON policies and schema definitions live under policy-packs/<pack>/cedar/. At startup the runtime compiles them, preserving ABAC/RBAC semantics and audit tooling that enterprises already trust.
  • Policy SDK: The forthcoming fleetforge-policy-sdk (TypeScript and Python) ships lint/test/publish helpers so teams can reuse their compliance suites, run unit tests locally, and emit provenance metadata (repo URL, signer) before publishing a pack.

Select the engine per evaluation via policy.engine=wasm|opa|cedar, or enable policy.engine=multi to execute and attest multiple engines for the same step. Regardless of the authoring surface, ChangeOps gates and receipts see the same attestation envelope.

Policy interoperability roadmap

FleetForge ships Rego policies compiled to Wasm today, but enterprise buyers expect alignment with the broader guardrail ecosystem. The roadmap focuses on three concurrent efforts (tracked on Roadmap & Status → Policy interoperability with evidence in Status & Acceptance → Policy interoperability):

  1. Rego bundle compatibility: adopt opa build-style bundles as a first-class package format so existing OPA authoring and analysis tooling (conftest, policy CI, drift detection) works with FleetForge packs. The new bundle loader will live in core/policy/src/bundles/ and emit the same Wasm modules the runtime executes today.
  2. Cedar schema bridges: regulated customers often standardize on AWS Cedar for ABAC/RBAC. We are adding a Cedar-to-Wasm translation pass that maps Cedar schema definitions into FleetForge's guardrail contract, enabling dual authoring and third-party audits without rewriting rules.
  3. Static analysis + attestations: all imported policies will carry provenance metadata (repo URL, commit, signer) inside the pack manifest so policy attestations show up alongside run receipts. Expect lint summaries in fleetforge-ctl policy inspect and a dedicated panel in the operator console.

These changes keep Rego/Wasm as the execution substrate while embracing established policy ecosystems for authoring, review, and hiring pipelines.

Capability credentials roadmap

Capability tokens already scope tool/action/budget access (core/runtime/src/capability.rs), but investigators increasingly need portable credentials they can verify offline. We are layering two additions on top of the existing contract (see Roadmap & Status → Portable capability credentials for timing and Status & Acceptance → Portable capability credentials for readiness):

  • Biscuit-style attenuation: the runtime will export capability chains as Biscuit v2 tokens so downstream services can narrow privileges (time, budget, tool subsets) without calling back to FleetForge. Each biscuit embeds the existing scope digest and budget limits.
  • Verifiable Credential projection: C2PA manifests and SCITT entries will reference a W3C VC representation of the capability chain so auditors and partner teams can validate receipts in air-gapped environments. The projection includes the capability token ID, signer, expiry, and the attestation IDs that prove the token was enforced.

Both features stay optional at first; toggle them via FLEETFORGE_CAPABILITY_EXPORTS so tenants can opt into Biscuit and VC exports independently while we graduate the format documented in Status & Acceptance → Portable capability credentials.

Policy packs marketplace

To speed up regulated launches, FleetForge is curating a marketplace of audited policy packs:

  • EU AI Act “deployer” pack: ships transparency notices, logging mandates, and retention ≥6 months, mapping directly to Articles 52–54. Prompts, tool calls, and exports automatically include the prescribed notices, default logging fields (purpose, provider, data sources), and transparency banners the Act requires before outputs leave FleetForge. /demo will offer a one-click “Apply EU AI Act template” flow so reviewers can see the pack enabled without editing config files.
  • Sector packs: HIPAA, PCI, and financial controls bundle guardrails, retention policies, and ChangeOps gates that clear common compliance checks.
  • Template publishing: Operators can clone packs, apply local diffs, and re-share them with signed provenance so internal audit trusts the lineage.

Marketplace packs live under policy-packs-enterprise/ with metadata describing their regulatory mapping. Update docs/reference/policy/regulated.md whenever a pack graduates so customers know which audits it satisfies out of the box, and track acceptance on Status & Acceptance → Policy packs marketplace.