Architecture
How Baker Street's components fit together in a Kubernetes-native design.
Architecture
Baker Street is a distributed system where every component runs as a Kubernetes pod. Components communicate exclusively through NATS JetStream, meaning no service needs to know the network address of any other. This section describes each component and how they connect.
High-Level Overview
User --> Gateway --> Brain --> NATS JetStream --> Workers
| |
v v
Qdrant Task Pods
(Memory) (Ephemeral)
^
|
Extensions
(MCP Pods)
A user message arrives through the Gateway (web, Telegram, or Discord), reaches the Brain, and the Brain orchestrates everything else through NATS.
Components
Brain (Orchestrator)
The Brain is the central orchestrator. When a message arrives, it:
- Loads conversation history from SQLite
- Searches vector memory (Qdrant) for relevant facts about the user
- Assembles a system prompt from personality files and discovered extension tools
- Sends everything to Claude with the available tool set
- Claude decides what to do: answer directly, store a memory, dispatch a background job, or a combination
- If work is dispatched, the Brain publishes jobs to NATS and waits for results
- Claude can iterate up to 10 tool calls per turn, combining information from multiple sources
- The final response streams back to the user in real time via Server-Sent Events
The Brain also runs an observer process. After each conversation, a lighter model (Haiku) extracts structured observations: decisions the user made, preferences they stated, facts about their environment. These compress into long-term knowledge over time.
Memory (Qdrant + Voyage AI)
The memory system has two layers working together.
Vector memory stores factual knowledge in Qdrant. When a message arrives, the Brain embeds it with Voyage AI and searches for related memories. Relevant facts appear in the system prompt so the agent always has context. Memories are automatically deduplicated at 92% cosine similarity.
Observational memory is a higher-level system backed by SQLite. The observer extracts structured observations from conversations -- not just facts, but patterns, decisions, and context. A reflector periodically compresses these observations into abstract knowledge. This gives the agent a layered understanding: raw memories at the bottom, synthesized knowledge at the top.
Six categories keep things organized: conversation, fact, preference, procedure, reference, and reflection.
Workers (NATS Consumers)
Workers are stateless pods that pull jobs from a NATS JetStream queue. They handle three job types:
- Agent -- Claude running on the worker with its own reasoning loop. Used for tasks that need thinking: summarizing documents, research, drafting content.
- Command -- Shell command execution with a strict allowlist. Used for quick queries:
uptime,kubectl get pods,df -h. - HTTP -- API calls from inside the cluster. Used for checking services, calling webhooks, or fetching external data.
Workers report status updates in real time (received -> running -> completed/failed). Multiple workers run in parallel, processing jobs concurrently from the shared queue. If a worker crashes mid-job, NATS redelivers the message to another worker.
Gateway (Multi-Channel)
The Gateway bridges external messaging platforms to the Brain's REST API. Each adapter handles platform-specific concerns: character limits, typing indicators, message splitting, and formatting. Supported channels include a React web UI, Telegram, and Discord. Each channel maps to its own conversation so context stays separate.
Task Pods (Ephemeral Execution)
Task Pods are Kubernetes Jobs launched on demand for isolated code execution. They receive instructions through NATS, execute in a locked-down environment, and report results back through NATS. Task pods have the strictest security posture: zero RBAC permissions, no Kubernetes API access, no ingress, egress limited to NATS, a 30-minute timeout, and automatic cleanup.
Extensions (MCP Protocol)
Extensions are sidecar pods that speak the Model Context Protocol (MCP) over HTTP. When deployed, an extension announces itself on NATS. The Brain connects, discovers the extension's tools, and makes them available to Claude immediately -- no restarts required.
When the extension pod goes away (deleted, crashed, scaled down), its tools disappear. When it comes back, they reappear. This is true hot-pluggable capability.
NATS JetStream (Message Bus)
NATS JetStream is the nervous system connecting everything. It provides durable delivery guarantees, load-balanced consumer groups, and pub/sub messaging. All inter-component communication flows through NATS, which means components are fully decoupled. The Brain does not know which worker will pick up a job, and workers do not know where the Brain lives.
Kubernetes-Native Design
Baker Street is not an application that happens to run on Kubernetes. It actively leverages Kubernetes primitives:
- Pods and Deployments for the Brain, Workers, Gateway, and Extensions
- Jobs for ephemeral Task Pods with automatic cleanup
- ConfigMaps for personality files (SOUL.md, BRAIN.md, WORKER.md)
- Secrets scoped per service so each component only sees what it needs
- Network Policies enforcing default-deny ingress on every pod
- Pod Security Contexts running non-root with read-only filesystems and dropped capabilities
- Kustomize for environment-specific configuration overlays