Memory & Context
Agents in a swarm have three levels of memory, each with a different purpose and lifetime.
Short-term memory (context window)
Section titled “Short-term memory (context window)”The active conversation — what the LLM can “see” right now.
System prompt + last N messages in the conversation- Lifetime: One agent session
- Size: Up to 128K tokens (DeepSeek-V3), 200K (Claude)
- Managed by: The LLM API automatically
Long contexts are expensive. The swarm uses automatic summarization: when the context gets full, older messages are summarized by a cheap model (L3) and compressed into a single message.
Working memory (shared state)
Section titled “Working memory (shared state)”A Redis JSON document that all agents in a swarm can read and write.
{ "topic": "Norwegian hydrogen economy", "findings": [ { "source": "researcher-1", "claim": "...", "confidence": 0.9 }, { "source": "researcher-2", "claim": "...", "confidence": 0.7 } ], "status": "research_complete", "next_step": "fact_check"}- Lifetime: One swarm execution
- Access: Read/write by all agents
- Use for: Tracking progress, sharing discoveries, coordinating
This is where the swarm’s “state of mind” lives. It’s ephemeral — gone when the swarm finishes.
Long-term memory (vector store)
Section titled “Long-term memory (vector store)”A semantic search index over previous swarm results.
Qdrant collection: swarm-memory├── Embedding of each swarm's final output├── Metadata: date, swarm type, cost, model used└── Searchable by semantic similarity- Lifetime: Persistent (days, weeks, months)
- Access: Read-only (new results are appended)
- Use for: “What did we learn about X last time?”
Example: A new research swarm about hydrogen can instantly recall that a previous swarm already covered “carbon capture” and link to those findings.
Context window strategy
Section titled “Context window strategy”The default strategy is conservative — start small, expand only when needed:
- Start with the system prompt + the current task (short context)
- The agent can ask for “more context” if it needs previous messages
- Automatic summarization kicks in when approaching the token limit
- The full conversation is always available via the message bus if needed
This keeps costs low (shorter contexts = cheaper API calls) while ensuring the agent can access history when it matters.
Visual summary
Section titled “Visual summary”┌─────────────────────────────────────────────────┐│ SHORT-TERM │ WORKING │ LONG-TERM ││ (Context) │ (Redis JSON) │ (Qdrant) ││ │ │ ││ What the LLM │ What the │ What we've ││ can see NOW │ swarm KNOWS │ LEARNED before ││ │ │ ││ ~128K tokens │ Key-value │ Vector search ││ Per-agent │ Shared │ Global ││ Ephemeral │ Per-swarm │ Persistent │└─────────────────────────────────────────────────┘