Baker Street
← Back to Features

Observability Stack

OpenTelemetry tracing, Prometheus metrics, and structured logging across the entire platform.

The Problem

When an AI agent reasons through a multi-step task -- dispatching jobs, querying memory, calling extensions -- you need to understand what happened and why. Traditional application logging tells you what occurred in a single process, but an agent spans multiple pods, NATS queues, and external services. Without distributed tracing, debugging a bad response means guessing which component went wrong. Without metrics, you cannot tell if the system is healthy until a user complains.

How Baker Street Solves It

Baker Street ships an optional observability stack that deploys to a separate Kubernetes namespace. Every component is instrumented with OpenTelemetry, providing end-to-end visibility across the entire agent pipeline.

The stack includes:

LLM calls are instrumented as OpenTelemetry spans with model name, role, iteration count, and token usage as attributes. Tool invocations appear as child spans. You can see exactly how the agent reasoned through a request: which tools it called, in what order, how long each step took, and what the token cost was.

Example

# k8s/observability/otel-collector.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: baker-street-observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 5s
        send_batch_size: 1024

    exporters:
      otlp/tempo:
        endpoint: tempo.baker-street-observability:4317
        tls:
          insecure: true
      prometheus:
        endpoint: 0.0.0.0:8889
      loki:
        endpoint: http://loki.baker-street-observability:3100/loki/api/v1/push

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheus]
        logs:
          receivers: [otlp]
          processors: [batch]
          exporters: [loki]

Learn More

See the Observability documentation for deployment instructions, custom dashboard creation, and alerting configuration.