Guardrails
Configure AI guardrail middleware for tool call validation, approval workflows, and output sanitization.
Guardrails
The guardrail system is a composable middleware chain that intercepts every tool call before execution. Each guardrail performs a specific check and can approve, reject, or escalate the call. Guardrails run in sequence, and any single rejection stops execution.
Guardrail Chain
The default enterprise configuration includes these guardrails in order:
- Prompt Injection Detection -- scans tool inputs for known injection patterns
- Schema Validation -- validates tool call arguments against the tool's JSON Schema
- Destructive Operation Gate -- flags operations that modify infrastructure (delete, restart, scale)
- Human-in-the-Loop Approval -- routes flagged operations to a human for approval
- Output Sanitization -- scrubs secrets and PII from tool outputs before returning to the LLM
Configuration
Guardrails are configured via a YAML file mounted as a ConfigMap:
# k8s/config/guardrails.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: guardrail-config
namespace: baker-street
data:
guardrails.yaml: |
chain:
- name: prompt-injection
enabled: true
config:
sensitivity: high
patterns:
- "ignore previous instructions"
- "you are now"
- "system prompt override"
- name: schema-validation
enabled: true
- name: destructive-gate
enabled: true
config:
flagged_tools:
- "command_execute"
- "launch_task"
flagged_patterns:
- "delete"
- "rm -rf"
- "kubectl delete"
- "DROP TABLE"
action: require_approval
- name: human-approval
enabled: true
config:
timeout_seconds: 300
notification_channel: "telegram"
auto_deny_on_timeout: true
- name: output-sanitization
enabled: true
config:
patterns:
- type: regex
match: "sk-ant-[a-zA-Z0-9]{20,}"
replace: "[REDACTED_API_KEY]"
- type: regex
match: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
replace: "[REDACTED_EMAIL]"
Command Allowlisting
Workers enforce a strict allowlist for shell command execution. Only pre-approved binaries can run:
command_allowlist:
- kubectl
- helm
- curl
- jq
- uptime
- df
- free
- ps
Commands not on the allowlist are rejected before execution. Environment variables containing secrets (matching patterns like *_KEY, *_TOKEN, *_SECRET) are stripped from the shell environment.
Approval Workflow
When a guardrail flags an operation for human approval:
- The tool call is paused and written to a pending approvals queue
- A notification is sent to the configured channel (Telegram, Discord, or Web UI)
- The notification includes the tool name, arguments, and the conversation context
- The human approves or denies within the configured timeout
- On approval, the tool call proceeds. On denial or timeout, it is rejected with a message to Claude
[Approval Request]
Tool: command_execute
Command: kubectl delete pod worker-abc123 -n baker-street
Context: User asked to restart the worker pod
Timeout: 5 minutes
[Approve] [Deny]
Custom Guardrails
Implement the GuardrailHook interface from @bakerst/core to create custom guardrails:
import { GuardrailHook, ToolCall, GuardrailResult } from '@bakerst/core';
export class CostLimitGuardrail implements GuardrailHook {
name = 'cost-limit';
async check(call: ToolCall): Promise<GuardrailResult> {
const dailyCost = await getDailyLLMCost();
if (dailyCost > this.config.maxDailyCost) {
return {
action: 'reject',
reason: `Daily LLM cost limit ($${this.config.maxDailyCost}) exceeded`,
};
}
return { action: 'approve' };
}
}
Register the guardrail in the chain configuration and redeploy the Brain.