FleetForge North Star
Complements the README introduction. Use this page to keep the product thesis, differentiation, and go-to-market guardrails in one canonical spot.
Product Thesis
FleetForge is the Trust Mesh for AI workflows and agent fleets: an identity, provenance, and policy plane that makes every agent action verifiable, governable, and replayable without forcing a new authoring framework. Teams keep their LangGraph, Microsoft Agent Framework/AutoGen, CrewAI, or bespoke orchestration, and FleetForge guarantees five non-negotiables that build on our Delivery/Policy/Replay heritage:
- Identity-secured steps - capability tokens minted per agent hop define who can invoke which tool, over what data, for how long, and at what budget.
- Provenance-sealed traces - every run emits cryptographically signed, deterministic envelopes that replay exactly, even months later.
- Policy everywhere - the same policy-as-code enforces standards at the runtime edge (tools, IO, context) and in promotion workflows.
- Attested delivery - release decisions consume the identical evidence captured at runtime, evolving ChangeOps into TrustOps.
- SLO + spend discipline - orchestration honours fleet-wide SLO and budget contracts using real-time telemetry.
By operating as the Trust Mesh, FleetForge moves beyond "plumbing." We give security, compliance, and platform teams concrete proof that OWASP LLM Top-10 threats are mitigated, NIST AI-600-1 controls are satisfied, and every change ships with receipts. The mesh is vendor-agnostic, policy-driven, and replay-backed.
Trust Mesh Pillars
Zero-Trust Tooling Mesh
- Mint ephemeral capability tokens per agent step; scopes capture tool ID, argument schema, data domain, cost/latency budget, and expiry.
- Enforce policy-as-code compiled to Wasm (Open Policy Agent) at every call-site so decisions execute inline across runtimes, browsers, and gateways.
- Map guardrails directly to OWASP LLM Top-10 categories (prompt injection, insecure outputs, excessive agency) and log verdicts as signed events.
Replay Twin & Counterfactual Lab
- Persist sealed traces that lock prompts, retrieved context, model/tool versions, and side effects; every envelope carries a provenance hash chain.
- Re-run "ghost" executions to test prompts/models/policies against identical inputs, surfacing score-per-token, hallucination deltas, latency, and spend shifts before promotion.
- Use deterministic replay to satisfy NIST AI-600-1 requirements for incident reconstruction, safety verification, and lifecycle governance.
TrustOps Graph (ChangeOps evolved)
- Promotion gates consume the same capability tokens, policy verdicts, and replay diffs captured at runtime.
- Gates pull OpenTelemetry GenAI spans/metrics (latency, token spend, cost, eval scores) annotated with attestation IDs so release evidence is machine-verifiable.
- Automate canary/shadow/rollback decisions with policy-as-code; TrustOps is the shared control surface for runtime and release governance.
AIBOM & Attestation Fabric
- Emit AI Bills of Materials (datasets, models, tools, policies, adapters, dependency hashes) per deploy and per incident using CycloneDX ML-BOM profiles.
- Chain SLSA and in-toto provenance for every runtime step and promotion event; publish SCITT-compatible transparency feeds for third-party verification.
- Attach C2PA Content Credentials to generated artifacts so downstream consumers can check origin, policy status, and moderation evidence.
Budget-Aware SLO Scheduling
- Orchestrate fleets with budget-aware schedulers that prioritise jobs by SLO x cost x policy risk.
- Adjust concurrency, model choice, and fallbacks based on GenAI telemetry; enforce tenant-specific caps and rate fairness.
- Surface proactive alerts and auto-remediation hooks when spend or SLO drift threatens contractual tolerances.
Standards alignment & scope guardrails
- OpenTelemetry GenAI – FleetForge emits the
trust.*attribute set defined by the OTEL GenAI semantic conventions and honoursOTEL_SEMCONV_STABILITY_OPT_IN=gen-aiso collectors can pin schemas during upgrades. - C2PA Content Credentials – user-facing artifacts carry C2PA manifests referencing the capability token chain and signer profile so provenance travels outside FleetForge.
- SCITT transparency – ChangeOps/TrustOps publish SCITT statements when promotion gates fire, giving auditors portable, independently verifiable receipts.
- Complementary tooling – Observability and eval stacks such as LangSmith, Phoenix, Weave, and Traceloop remain in place; FleetForge focuses on enforceable policy, attestations, and deterministic replay, not dashboarding.
Differentiation & Ecosystem Fit
FleetForge integrates with the stacks teams already run; we add the trust plane that none of them cover.
| Concern | Authoring (LangGraph/AutoGen/CrewAI) | Observability (LangSmith/Langfuse/Phoenix) | Workflow (Temporal/Prefect/Airflow) | FleetForge Trust Mesh |
|---|---|---|---|---|
| Agent graph / roles / tools | Yes | No | No | Bring your own |
| Durable execution & SLOs | Partial | No | Yes | Yes (agent-aware) |
| Safety & cost policy gates | Partial / ad-hoc | Alerts/evals | No | Policy-as-code + Wasm |
| Deterministic replays | Partial checkpoints | Re-evals | Partial history | End-to-end, attested |
| Identity & provenance | No | No | No | Signed attestations |
| Change/TrustOps | No | No | No | Telemetry + attestations |
Integration stance
- Maintain adapters in
sdk/python/fleetforgeandcore/runtime/src/adapters.rsso LangGraph, Agent Framework/AutoGen, CrewAI, and custom orchestrators emit FleetForge envelopes without refactoring authoring logic. - Emit OpenTelemetry GenAI spans enriched with trust metadata (capability token IDs, attestation links) so LangSmith, Langfuse, and Phoenix stay first-class destinations while their eval outputs feed back into TrustOps gates.
- Keep every guarantee framework-neutral: capability tokens, policy packs, replays, TrustOps, and attestations work even when customers swap authoring SDKs or run multiple in parallel.
Standards & Proof Alignment
- OWASP LLM Top-10 – default policy packs ship enforcement examples mapped to each category; violations produce signed Trust Mesh events.
- NIST AI-600-1 (Generative AI Profile) – governance checkpoints trace directly to the NIST Generative AI Profile (AI-600-1); replay artifacts and TrustOps approvals are stored as auditable records against those controls.
- NIST AI RMF 1.0 – risk management outcomes from the NIST AI RMF 1.0 map to TrustOps gates and Hello Fleet evidence, keeping policy + delivery stories anchored to the RMF’s govern/map/measure/manage functions.
- OpenTelemetry GenAI – spans include the canonical
trust.*fields (trust.attestation_ids,trust.attestation_id,trust.subject,trust.policy_decision_id,trust.capability_token_id). These are FleetForge vendor attributes aligned to the OpenTelemetry GenAI Semantic Conventions. Pin the schema with:We also monitor the upstream “agentic systems” semconv proposal so key names only change when the spec stabilises; the live keyset stays documented inexport OTEL_SEMCONV_STABILITY_OPT_IN=gen-aicore/telemetry::TRUST_ATTRIBUTE_KEYS. - C2PA Content Credentials – generated outputs embed provenance manifests referencing capability tokens and moderation verdicts.
- SLSA + in-toto – pipeline, runtime, and replay steps emit attestations verifying code, model, and policy supply chain integrity.
- IETF SCITT – transparency feeds expose signed ledger entries for capability token issuance, policy updates, and TrustOps decisions so external auditors can verify them independently.
Signer caveats & guidance
FleetForge ships flexible signer backends (env keypairs, CLI shims, or cloud KMS). Azure Key Vault currently lacks Ed25519 support (Microsoft’s key-type matrix confirms only RSA/ECDSA curves such as P-256/P-384/P-521/P-256K), so prefer RSA-PSS (RSASSA-PSS) or ECDSA (ES256/ES384) when targeting Azure-backed surfaces. Local development defaults to env-ed25519, while AWS/GCP KMS backends continue to support Ed25519 for production workloads. See the dedicated Signer Profiles page for tested recipes, environment variables, and verification commands per cloud.
Who We Serve & Required Outcomes
- Primary buyers: platform teams running >=10 agents with cross-team reuse and formal compliance obligations.
- Enterprise wedge: regulated sectors (finance, healthcare, public) that require audit-ready provenance for every decision path.
- Builders: product teams who need to ship safely without abandoning their existing authoring frameworks.
Required outcomes:
- No untrusted actions - unsigned or unauthorized calls fail closed and emit policy explanations.
- No provenance gaps - every step emits an attestation binding identity -> input -> tool -> output.
- No heisenbugs - deterministic replays with attestations and OpenTelemetry spans prove exactly what happened.
Core Capabilities
GA Baseline
- Adapter coverage for LangGraph, AutoGen, CrewAI, and bespoke orchestrators so every step emits signed in-toto predicates (tool calls) and C2PA credentials (user artifacts) without forcing a framework switch.
- Policy packs compiled from OPA/Rego to Wasm, aligned to OWASP LLM Top-10 categories and mapped to the relevant NIST AI-600-1 control IDs.
- Trust graph + replay store that fuses an event-sourced log with an attestation vault; deterministic replays always resolve inputs via attestation IDs.
- OpenTelemetry GenAI spans enriched with the canonical
trust.*fields (trust.attestation_ids,trust.attestation_id,trust.subject,trust.policy_decision_id,trust.capability_token_id) so telemetry, policy, and evidence stay linked.
Expansion Arcs
- TrustOps release gates that require expected attestations, evaluation coverage, and replay diffs before canaries scale out (see “TrustOps Release Governance”).
- Optional SCITT transparency feeds plus cross-cloud signing paths so auditors can verify evidence externally (see “Transparency & Enterprise Controls”).
- Cross-framework portability tests and multi-cloud KMS support to prove attestations stay consistent when customers mix runtimes or rotate signing backends (see “Attestation Fabric & Capability Guard”).
Trust Mesh GA Execution Plan
The detailed GA delivery tracks now live in docs/roadmap/roadmap-and-status.md so we only maintain them once. That page carries the canonical execution breakdown (Policy interoperability, Portable capabilities, Deterministic replay controls, and Observability & transparency) plus status notes. Readiness for each beat stays centralized on the Status & Acceptance tracker—see the rows for Policy interoperability, Portable capability credentials, Deterministic replay controls, and Telemetry compatibility + transparency—so we only declare “green” in one place.
Architecture Commitments
- Attestation envelope: every step (pre-LLM, tool call, retrieval, post-LLM) emits a signed attestation referencing inputs, artifacts, versions, and policy decisions using SLSA/in-toto predicates, with C2PA credentials for user-facing outputs.
- Policy hooks (OPA -> Wasm): enforce prompt and budget checks before LLM calls, apply throttles and model switches mid-run, validate outputs post-LLM, and extend the same hooks to merge-time TrustOps gates.
- OpenTelemetry everywhere: continue emitting GenAI spans and metrics augmented with trust fields while preserving existing redaction and PII handling behavior.
Key Design Choices (and why they sell)
- Standards-anchored trust: OWASP and NIST controls plus provenance standards (C2PA, SLSA/in-toto, SCITT) make the guarantees legible to security and compliance buyers; the value is evidence others can verify, not just our own logs.
- Policy-as-code: Rego compiled to Wasm keeps enforcement fast, portable, and reviewable, letting teams audit policies like code and run them at every hop.
- Deterministic replay with attestations: replays pin to attestation IDs so audits can prove sameness across incidents, regressions, and release validations.
Acceptance Tests (First Green Bar)
The Status & Acceptance page is the single source of truth for readiness criteria. Its Hello Fleet walkthrough plus the checkpoint table explicitly cover the five North Star requirements—attested replay (Deterministic replay controls), policy coverage (Policy everywhere), output provenance (C2PA Content Credentials), SCITT compatibility (SCITT feed and Attestation vault API), and OTEL trust attributes (OpenTelemetry GenAI spans)—so “First Green Bar” is defined once and exercised end-to-end there.
Metrics
- Trust: percentage of steps with valid attestations, policy catch rate, and mean time to provenance evidence.
- Delivery: replay success rate, SLO misses under load, and cost per successful task.
- Developer experience: time to first attested run and adapter install-to-first replay.
Pricing Thesis
- Free: single workspace with baseline trust (local signing) and 7-day retention.
- Pro: per-run billing, managed policy packs, hosted attestation vault, and 30-day retention.
- Enterprise: SSO/SAML, VPC deployments, external KMS/HSM support, SCITT publishing, and audit exports.
One-liner, Demo, Moat
- One-liner: FleetForge is the Trust Mesh for AI workflows and agent fleets--identity, provenance, and policy with one-click attested replays.
- 3-minute demo: run an agent, trigger a prompt-injection that the policy gate blocks with a signed decision, force a budget-driven model switchover, produce a C2PA-signed output, replay the run with attestation diffs, then attempt a PR merge that TrustOps blocks until evals and attestations pass.
- Moat: the compounding trust graph plus telemetry becomes the organization's system of record; replacing FleetForge means losing provenance history and the gating muscle built on it.
GTM Guardrails & Execution Focus
- Day-1 value metrics: adapter install in <30 minutes, first signed trace + replay in <60 minutes, first policy catch in <24 hours, first TrustOps gate tied to CI/CD or deployment in <7 days.
- Policy pack strategy: maintain open default packs aligned to OWASP and NIST; offer premium/vertical packs (HIPAA, PCI, FedRAMP) with enterprise support.
- Telemetry leverage: partner with observability vendors via OpenTelemetry to showcase attested traces; co-marketing on Trust Mesh evidence streams.
- Proof beats pitch: publish reference architectures demonstrating C2PA-enabled outputs, SLSA/in-toto promotion chains, and SCITT feed integrations.
- Pricing guardrail: usage-based (per run / per workspace) with premium tiers unlocked by advanced TrustOps automation, compliance packs, and extended retention.
FleetForge wins when every agent fleet treats the Trust Mesh as mandatory infrastructure--security trusts the guardrails, compliance trusts the receipts, and builders trust that iteration stays fast.