Your AI calls already emit OpenTelemetry: your LLM gateway exports it, and it’s the open standard your own services can speak. But you don’t have anywhere to turn those spans into spend you can allocate to an outcome.

Now you can. CloudZero exposes an OpenTelemetry endpoint that doesn’t care what’s on the other end. Send it GenAI OTel that conforms to the gen_ai semantic conventions, from LiteLLM, another gateway, or your own instrumentation, and it lands in the same allocation engine. You monitor it within seconds, break it out by model, provider, agent, and user within minutes, and fully allocate it within hours, to finance-grade accuracy. There’s no proprietary agent and no lock-in.

Your observability vendor already takes that same telemetry and hands you latency charts and traces. It won’t turn a single span into a dollar allocated to a customer. Same data, a question it was never built to answer.

Why this matters

Building on the open standard means you’re not betting on one capture method or a proprietary agent. Anything that speaks the gen_ai conventions works the day you point it at CloudZero. There’s nothing for us to build to onboard your source. The capture method is one part, but the  engine downstream should grab your attention.

What we built

We built an OpenTelemetry ingest endpoint and a streaming pipeline created for the volume and shape of AI telemetry, parsing the OTEL gen_ai semantic conventions and feeding the allocation engine, CostFormation, that’s run CloudZero in production for 10 years. We conform to the spec (the same one LiteLLM formats to) so any source that emits compliant gen_ai OTel flows in, no custom work on our side.

How design partners use it

It’s available to design partners now. Today most point a LiteLLM gateway at the endpoint, but anything that emits compliant gen_ai OTel works the same way. There’s no app to rebuild, no agent of ours to deploy. It’s the telemetry you already produce, allocated to outcomes.

Talk to us about the design partner program.