Cost Routing
Cost routing is the system that answers one question:
What’s the cheapest model that can do this task well?
The L1–L4 tier system
Section titled “The L1–L4 tier system”Every available model is assigned a tier based on capability and cost:
| Tier | Models | Cost | Use for |
|---|---|---|---|
| L4 — Local | Llama, Mistral, Qwen (Ollama) | Free | Prototyping, simple classification |
| L3 — Light | GPT-4o-mini, Claude Haiku, DeepSeek-V3-lite | ~$0.15/M input | Summarization, simple text, fact-checking |
| L2 — Standard | DeepSeek-V3, Claude Sonnet, GPT-4o | ~$0.27-3/M input | General problem-solving, research, writing |
| L1 — Heavy | DeepSeek-R1, Claude Opus | ~$0.55/M+ input | Complex reasoning, architecture, debugging |
How it works
Section titled “How it works”User asks a question │ ▼1. CLASSIFY — a cheap model (L4) classifies complexity │ ├── "trivial" → L4 (local, free) ├── "simple" → L3 (cheapest API) ├── "moderate" → L2 (standard) └── "complex" → L1 (best available) │ ▼2. ROUTE — the task goes to the selected tier │ ▼3. ESCALATE — if the model fails, move up one tierEscalation
Section titled “Escalation”If a cheaper model fails (incomplete answer, wrong format, hallucination flag), the system automatically tries the next tier:
L4 failed → try L3L3 failed → try L2L2 failed → try L1L1 failed → report errorThis means:
- 80% of tasks are solved by L3 or L4 (nearly free)
- 18% need L2 (moderate cost)
- ~2% actually need L1 (premium cost)
The result: premium accuracy at a fraction of the price.
Cost budgets
Section titled “Cost budgets”Every swarm has a budget:
cost_budget: max_total_usd: 2.50 # Hard cap for the entire swarm max_per_agent_usd: 0.80 # Cap per individual agentWhen the budget is 95% spent, the swarm stops — no surprise bills.
Caching
Section titled “Caching”Two levels of caching reduce costs further:
- Exact match — identical prompt + model + parameters → cached response (free)
- Semantic match — very similar question (embedding similarity > 0.97) → cached response
For commonly asked questions across many swarms, this adds up quickly.
Cost dashboard
Section titled “Cost dashboard”Every swarm returns a detailed cost report:
{ "swarm_id": "abc-123", "total_cost_usd": 0.067, "total_tokens": 16500, "agents": [ { "agent_id": "manager", "model": "deepseek-v3", "calls": 3, "cost_usd": 0.012 } ], "cache_hits": 2, "cache_savings_usd": 0.008, "escalations": 0}You always know exactly what you spent — and what you saved.