Policy System Overview
Policies let you describe what’s allowed in a workflow—budget, tools, data, and network paths—and show proof when something is denied or allowed later.
FleetForge enforces policy at multiple layers so every agent run stays within approved safety, cost, and compliance boundaries. This page explains how the policy engine works and how to reason about guardrails in production.
Layers
- Step guardrails (
policy.guardrails[]): Per-step flags that enable prompt injection filtering, PII controls, sandbox hints, and HTTP allowlists. - Policy packs (
FLEETFORGE_POLICY_PACK): Runtime-wide bundles (HIPAA, GDPR, OWASP demo, allow_all) that configure default sandboxes, tool/image allowlists, and regulated behaviours. - Budget policies: Caps enforced by the scheduler that fail steps when reserved or actual spend crosses configured thresholds.
- ChangeOps gates: Release checks that use telemetry and eval results to block un-reviewed changes before shipping (see ChangeOps concept).
Budget & SLO contracts
Every step spec may declare both a spend ceiling and an SLO tier inside the
policy block. The budget values gate how many tokens/cost units the scheduler
is willing to reserve up front, while the SLO metadata drives queue ordering and
telemetry.
"policy": {
"budget": {
"tokens": 200000,
"cost": 3.00
},
"slo": {
"tier": "gold",
"queue_target_ms": 5000,
"priority_boost": 2,
"error_budget_ratio": 0.20
}
}
budget.tokens/budget.cost– maximum reservation for the step. When set, the scheduler callsStorage::try_consume_budgetbefore execution and fails the step withkind=budget_exceededif the run has already exhausted its quota. Absent values fall back toexecution.cost_hintor the adapter inputs (token_estimate,cost_estimate).slo.queue_target_ms– desired queue latency. The scheduler continuously tracksobserved_queue_ms, computes slack (target - wait), and prioritises breached contracts ahead of best-effort work.slo.priority_boost– additional priority offset applied during ranking so tiers can jump the queue even before breaching.slo.tier/slo.error_budget_ratio– descriptive metadata captured in telemetry and outbox events so TrustOps scorecards can group breaches per contract.
The runtime emits fleetforge.slo.queue_slack_ms and
fleetforge.slo.preemptions metrics whenever a contracted step is evaluated, so
dashboards can visualise queue health and throttling behaviour. Budget snapshots
and SLO state are also attached to every step_attempt row and run event
payload, feeding ChangeOps’ “budget scorecard”.
Runtime knobs
FLEETFORGE_MAX_INFLIGHT_TOKENS– caps the total reserved tokens across in-flight steps; the scheduler defers additional work once the limit is hit.FLEETFORGE_MAX_INFLIGHT_COST– identical control for USD cost reservations.FLEETFORGE_QUEUE_BACKPRESSURE_MS– overrides the default 50 ms delay that backpressure deferrals add to a step’snot_beforetimestamp.
Leaving these variables unset keeps the previous behaviour (no inflight cap and
50 ms defer window). All three knobs are read inside
core/runtime/src/scheduler.rs so Helm charts and the toolbox can set them per
environment.
Architecture
core/policy/– Policy engine, guardrail evaluators, and policy pack implementations.core/runtime/src/guardrails.rs– Runtime integration that applies guardrail effects (deny, redact, modify) before and after executor calls.policy-packs/– Source of truth for pack definitions (hipaa,gdpr,owasp_demo, etc.).core/policy/packs/prompt_injection/– OPA/Rego source compiled to WebAssembly for prompt-injection detection.
Telemetry & dashboards
- Runtime spans expose the canonical
trust.*OpenTelemetry attributes (trust.attestation_ids,trust.attestation_id,trust.subject,trust.policy_decision_id,trust.capability_token_id) defined incore/telemetry::TRUST_ATTRIBUTE_KEYS. Use these keys whenever you need to correlate a policy event with attestation evidence or a capability token. - The ClickHouse Grafana dashboard (
deploy/otel/grafana-clickhouse-traces.json) now ships with a Policy Trust Chain panel that lets you filter spans bytrust.attestation_idand the recorded policy effect so on-call operators can pivot directly from a failing run to its signed policy decisions.
Operator Checklist
- Pick the tightest policy pack that meets your data requirements. Extend tool/image/network allowlists only after ChangeOps or ticket approvals.
- Use
fleetforge-ctl gatesto require eval coverage before promoting prompt or tool changes. - Review
policy_decisionsartifacts in the UI or object store to understand why a run was denied or redacted. - Keep demos air-gapped: follow the demo hardening how-to when exposing environments publicly.
Related Docs
- Reference: Guardrail matrix
- Reference: Policy packs
- How-to: Add LangGraph adapter
- Tutorial: Hello Fleet walkthrough
Pluggable policy engine & SDK
The Wasm guardrail runtime stays the default execution substrate, but customers can now run Open Policy Agent or Cedar policies beside those packs:
- OPA bundles: Drop Rego bundles into any policy pack (
policy-packs/<pack>/opa/). The runtime loads the bundle viaopa::wasm, routes step inputs into the specified entrypoint, and records the verdict alongside the Wasm decision. - Cedar policies: Cedar JSON policies and schema definitions live under
policy-packs/<pack>/cedar/. At startup the runtime compiles them, preserving ABAC/RBAC semantics and audit tooling that enterprises already trust. - Policy SDK: The forthcoming
fleetforge-policy-sdk(TypeScript and Python) ships lint/test/publish helpers so teams can reuse their compliance suites, run unit tests locally, and emit provenance metadata (repo URL, signer) before publishing a pack.
Select the engine per evaluation via policy.engine=wasm|opa|cedar, or enable
policy.engine=multi to execute and attest multiple engines for the same step.
Regardless of the authoring surface, ChangeOps gates and receipts see the same
attestation envelope.
Policy interoperability roadmap
FleetForge ships Rego policies compiled to Wasm today, but enterprise buyers expect alignment with the broader guardrail ecosystem. The roadmap focuses on three concurrent efforts (tracked on Roadmap & Status → Policy interoperability with evidence in Status & Acceptance → Policy interoperability):
- Rego bundle compatibility: adopt
opa build-style bundles as a first-class package format so existing OPA authoring and analysis tooling (conftest, policy CI, drift detection) works with FleetForge packs. The new bundle loader will live incore/policy/src/bundles/and emit the same Wasm modules the runtime executes today. - Cedar schema bridges: regulated customers often standardize on AWS Cedar for ABAC/RBAC. We are adding a Cedar-to-Wasm translation pass that maps Cedar schema definitions into FleetForge's guardrail contract, enabling dual authoring and third-party audits without rewriting rules.
- Static analysis + attestations: all imported policies will carry provenance
metadata (repo URL, commit, signer) inside the pack manifest so policy
attestations show up alongside run receipts. Expect lint summaries in
fleetforge-ctl policy inspectand a dedicated panel in the operator console.
These changes keep Rego/Wasm as the execution substrate while embracing established policy ecosystems for authoring, review, and hiring pipelines.
Capability credentials roadmap
Capability tokens already scope tool/action/budget access (core/runtime/src/capability.rs),
but investigators increasingly need portable credentials they can verify offline.
We are layering two additions on top of the existing contract (see Roadmap & Status → Portable capability credentials for timing and Status & Acceptance → Portable capability credentials for readiness):
- Biscuit-style attenuation: the runtime will export capability chains as Biscuit v2 tokens so downstream services can narrow privileges (time, budget, tool subsets) without calling back to FleetForge. Each biscuit embeds the existing scope digest and budget limits.
- Verifiable Credential projection: C2PA manifests and SCITT entries will reference a W3C VC representation of the capability chain so auditors and partner teams can validate receipts in air-gapped environments. The projection includes the capability token ID, signer, expiry, and the attestation IDs that prove the token was enforced.
Both features stay optional at first; toggle them via FLEETFORGE_CAPABILITY_EXPORTS
so tenants can opt into Biscuit and VC exports independently while we graduate
the format documented in Status & Acceptance → Portable capability credentials.
Policy packs marketplace
To speed up regulated launches, FleetForge is curating a marketplace of audited policy packs:
- EU AI Act “deployer” pack: ships transparency notices, logging mandates,
and retention ≥6 months, mapping directly to Articles 52–54. Prompts, tool
calls, and exports automatically include the prescribed notices, default
logging fields (purpose, provider, data sources), and transparency banners the
Act requires before outputs leave FleetForge.
/demowill offer a one-click “Apply EU AI Act template” flow so reviewers can see the pack enabled without editing config files. - Sector packs: HIPAA, PCI, and financial controls bundle guardrails, retention policies, and ChangeOps gates that clear common compliance checks.
- Template publishing: Operators can clone packs, apply local diffs, and re-share them with signed provenance so internal audit trusts the lineage.
Marketplace packs live under policy-packs-enterprise/ with metadata describing
their regulatory mapping. Update docs/reference/policy/regulated.md whenever a
pack graduates so customers know which audits it satisfies out of the box, and
track acceptance on Status & Acceptance → Policy packs marketplace.