Baker Street

Observability

OpenTelemetry traces, Prometheus metrics, and structured logging for Baker Street.

Observability

Baker Street ships with an optional observability stack that deploys to a separate namespace. This gives you full visibility into how the agent reasons, where time is spent, and what is happening across the system.

Stack Components

The observability stack includes:

ComponentRole
OpenTelemetry CollectorReceives OTLP from all services
TempoDistributed trace storage
LokiLog aggregation
PrometheusMetrics collection
GrafanaDashboards and visualization

Deploy the stack with:

kubectl apply -k k8s/overlays/observability

Distributed Tracing

Every API response includes an X-Trace-Id header. Trace context propagates through NATS messages, so a single user request can be traced across:

User Request --> Brain --> NATS --> Worker --> Tool Execution

LLM calls are instrumented as spans with metadata including:

  • Model name and version
  • Token usage (input/output)
  • Tool call names and iteration count
  • Response latency

Tool executions appear as child spans. You can see exactly how the agent reasoned through a request: which memories it retrieved, which tools it called, how many iterations it needed, and where time was spent.

Prometheus Metrics

Baker Street exports metrics on a /metrics endpoint from both Brain and Worker services:

  • baker_requests_total -- total API requests by endpoint and status
  • baker_llm_calls_total -- LLM API calls by model and outcome
  • baker_llm_tokens_total -- token usage by model and direction (input/output)
  • baker_jobs_total -- jobs dispatched by type and status
  • baker_memory_operations_total -- memory store/search/delete operations
  • baker_extension_tools_active -- currently registered extension tools
  • baker_task_pods_active -- currently running task pods

Structured Logging

All services emit structured JSON logs with consistent fields:

{
  "level": "info",
  "service": "brain",
  "traceId": "abc123...",
  "spanId": "def456...",
  "msg": "Tool call completed",
  "tool": "memory_search",
  "duration_ms": 42
}

Logs are collected by Loki and correlated with traces using the shared traceId field. Click a trace in Grafana and see the corresponding logs inline.

Grafana Dashboards

Baker Street ships with pre-built Grafana dashboards:

  • Agent Overview -- request rate, response latency, error rate, active conversations
  • LLM Usage -- model distribution, token consumption, cost estimation, latency percentiles
  • Jobs and Workers -- job throughput, queue depth, worker utilization, failure rates
  • Memory -- store/search rates, deduplication hits, Qdrant collection stats
  • Extensions -- active extensions, tool call frequency, extension latency

Access Grafana by port-forwarding:

kubectl port-forward -n observability svc/grafana 3000:3000