Developer Docs

Integrate in
5 minutes.

One line change. Works with every LLM provider. Drop-in compatible with OpenAI SDK, Anthropic SDK, LangChain, and more.

Python

Node.js

curl

import openai

# Before Trimio
client = openai.OpenAI(
    api_key="sk-..."
)

# After Trimio — one line change
client = openai.OpenAI(
    api_key="sk-...",
    base_url="https://api.trimio.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

import OpenAI from 'openai';

// After Trimio — one line change
const client = new OpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://api.trimio.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }]
});

curl https://api.trimio.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Getting Started

Live in three steps.

Get your virtual key

Change one URL

Point your SDK's base_url to api.trimio.ai/v1. That's it.

Watch savings accumulate

Dashboard populates in real time. First savings report within 30 days. Zero maintenance from your team.

Features

What you get out of the box.

OBSERVABILITY

Full Request Logging

Every request logged with 40+ fields: latency, tokens, model, cache status, cost, savings. Prometheus + OTEL native.

RELIABILITY

Automatic Fallbacks

If a provider is down, requests automatically route to the next best option. Zero downtime, zero code changes.

CACHING

Provider Cache Optimization

Sophisticated cache-aware request handling that maximizes hit rates against Anthropic, OpenAI, and Google's native prompt caching. 93% average token savings on cache hits. No Trimio-side cache state to manage.

GOVERNANCE

Rate Limiting

Per-key and per-team rate limits. Prevent runaway scripts from generating surprise bills overnight.

BUDGETS

Budget Controls

Set monthly spend limits per team, project, or key. Get alerted before limits are hit — not after.

SECURITY

Virtual Key Governance

Issue scoped virtual keys per team. Rotate and revoke without touching provider credentials.

Observability

Every metric, every request.

40+

Fields per request

<20ms

Added latency

99.99%

Uptime SLA

1,600+

LLMs supported

SDK & Framework Compatibility

SDK / Framework	Status	Notes
OpenAI Python SDK	Drop-in	Change base_url only
OpenAI Node.js SDK	Drop-in	Change baseURL only
Anthropic SDK	Drop-in	OpenAI-compat endpoint
LangChain	Drop-in	Works with all LLM wrappers
LlamaIndex	Drop-in	All model integrations
curl / HTTP	Full	Standard OpenAI REST API

Exports natively to your existing stack:

Prometheus Grafana OpenTelemetry Datadog LangChain LlamaIndex TimescaleDB

Integrate in5 minutes.

Live in three steps.

What you get out of the box.

Every metric, every request.

Start saving in 5 minutes.

Integrate in
5 minutes.