Phase 6 – Telemetry Integrations
Phase 6 aligns FleetForge telemetry with the OpenTelemetry GenAI semantic conventions and extends exports for Langfuse/LangSmith/Phoenix so operators get SLO dashboards out of the box.
Highlights
- GenAI spans now tag requests with
gen_ai.system,gen_ai.operation.name,gen_ai.request.model,gen_ai.response.model, and duration/usage attributes for LLM, tool, and agent executions. - Metrics exposed via OTEL counters/histograms:
gen_ai.prompt.tokens,gen_ai.completion.tokens,gen_ai.tokens.totalgen_ai.cost.usd,gen_ai.request.durationfleetforge.policy.eventsfor guardrail bursts
- LangGraph agent spans mirror LLM telemetry so adapter runs inherit the same dashboards and cost tracking.
- External exporters (Langfuse, LangSmith, Phoenix) receive the enriched payloads without additional configuration.
Acceptance Criteria
- Span attributes and metrics conform to OTEL GenAI naming so downstream tools (Grafana/Langfuse/LangSmith/Phoenix) ingest them without mapping.
- Dashboards can chart SLO burn (duration + error rates), cost burn, and policy hits straight from the OTLP stream.
- GenAI metrics aggregate per
gen_ai.systemandgen_ai.request.model, covering both built-in LLM steps and LangGraph adapters.
Metric Reference
| Metric | Type | Description |
|---|---|---|
gen_ai.prompt.tokens | Counter | Prompt tokens consumed per request |
gen_ai.completion.tokens | Counter | Completion tokens emitted per request |
gen_ai.tokens.total | Counter | Prompt + completion tokens |
gen_ai.cost.usd | Counter | Provider reported USD cost |
gen_ai.request.duration | Histogram (ms) | End-to-end request latency |
fleetforge.policy.events | Counter | Guardrail/policy decision count |
All metrics surface gen_ai.system and gen_ai.request.model labels so they can
be sliced per provider/model. Policy events carry fleetforge.policy.effect and
fleetforge.policy.pack.
Notes
- Ensure an OTLP collector is configured (
OTEL_EXPORTER_OTLP_ENDPOINT) so the new metrics reach Grafana/ClickHouse. - Existing Langfuse/LangSmith/Phoenix exporters automatically include the added attributes; no additional configuration required.
- Dashboards should combine the new metrics with existing queue/budget metrics to visualise end-to-end burn and SLO attainment.