Skip to content

Cost Routing

Cost routing is the system that answers one question:

What’s the cheapest model that can do this task well?


Every available model is assigned a tier based on capability and cost:

TierModelsCostUse for
L4 — LocalLlama, Mistral, Qwen (Ollama)FreePrototyping, simple classification
L3 — LightGPT-4o-mini, Claude Haiku, DeepSeek-V3-lite~$0.15/M inputSummarization, simple text, fact-checking
L2 — StandardDeepSeek-V3, Claude Sonnet, GPT-4o~$0.27-3/M inputGeneral problem-solving, research, writing
L1 — HeavyDeepSeek-R1, Claude Opus~$0.55/M+ inputComplex reasoning, architecture, debugging

User asks a question
1. CLASSIFY — a cheap model (L4) classifies complexity
├── "trivial" → L4 (local, free)
├── "simple" → L3 (cheapest API)
├── "moderate" → L2 (standard)
└── "complex" → L1 (best available)
2. ROUTE — the task goes to the selected tier
3. ESCALATE — if the model fails, move up one tier

If a cheaper model fails (incomplete answer, wrong format, hallucination flag), the system automatically tries the next tier:

L4 failed → try L3
L3 failed → try L2
L2 failed → try L1
L1 failed → report error

This means:

  • 80% of tasks are solved by L3 or L4 (nearly free)
  • 18% need L2 (moderate cost)
  • ~2% actually need L1 (premium cost)

The result: premium accuracy at a fraction of the price.


Every swarm has a budget:

cost_budget:
max_total_usd: 2.50 # Hard cap for the entire swarm
max_per_agent_usd: 0.80 # Cap per individual agent

When the budget is 95% spent, the swarm stops — no surprise bills.


Two levels of caching reduce costs further:

  1. Exact match — identical prompt + model + parameters → cached response (free)
  2. Semantic match — very similar question (embedding similarity > 0.97) → cached response

For commonly asked questions across many swarms, this adds up quickly.


Every swarm returns a detailed cost report:

{
"swarm_id": "abc-123",
"total_cost_usd": 0.067,
"total_tokens": 16500,
"agents": [
{
"agent_id": "manager",
"model": "deepseek-v3",
"calls": 3,
"cost_usd": 0.012
}
],
"cache_hits": 2,
"cache_savings_usd": 0.008,
"escalations": 0
}

You always know exactly what you spent — and what you saved.