Skip to main content

Phase 6 – Telemetry Integrations

Phase 6 aligns FleetForge telemetry with the OpenTelemetry GenAI semantic conventions and extends exports for Langfuse/LangSmith/Phoenix so operators get SLO dashboards out of the box.

Highlights

  • GenAI spans now tag requests with gen_ai.system, gen_ai.operation.name, gen_ai.request.model, gen_ai.response.model, and duration/usage attributes for LLM, tool, and agent executions.
  • Metrics exposed via OTEL counters/histograms:
    • gen_ai.prompt.tokens, gen_ai.completion.tokens, gen_ai.tokens.total
    • gen_ai.cost.usd, gen_ai.request.duration
    • fleetforge.policy.events for guardrail bursts
  • LangGraph agent spans mirror LLM telemetry so adapter runs inherit the same dashboards and cost tracking.
  • External exporters (Langfuse, LangSmith, Phoenix) receive the enriched payloads without additional configuration.

Acceptance Criteria

  • Span attributes and metrics conform to OTEL GenAI naming so downstream tools (Grafana/Langfuse/LangSmith/Phoenix) ingest them without mapping.
  • Dashboards can chart SLO burn (duration + error rates), cost burn, and policy hits straight from the OTLP stream.
  • GenAI metrics aggregate per gen_ai.system and gen_ai.request.model, covering both built-in LLM steps and LangGraph adapters.

Metric Reference

MetricTypeDescription
gen_ai.prompt.tokensCounterPrompt tokens consumed per request
gen_ai.completion.tokensCounterCompletion tokens emitted per request
gen_ai.tokens.totalCounterPrompt + completion tokens
gen_ai.cost.usdCounterProvider reported USD cost
gen_ai.request.durationHistogram (ms)End-to-end request latency
fleetforge.policy.eventsCounterGuardrail/policy decision count

All metrics surface gen_ai.system and gen_ai.request.model labels so they can be sliced per provider/model. Policy events carry fleetforge.policy.effect and fleetforge.policy.pack.

Notes

  • Ensure an OTLP collector is configured (OTEL_EXPORTER_OTLP_ENDPOINT) so the new metrics reach Grafana/ClickHouse.
  • Existing Langfuse/LangSmith/Phoenix exporters automatically include the added attributes; no additional configuration required.
  • Dashboards should combine the new metrics with existing queue/budget metrics to visualise end-to-end burn and SLO attainment.