InitRunner

Guardrails

Guardrails prevent runaway agents by enforcing per-run limits, session budgets, daemon budgets, and autonomous budgets. All limits are enforced automatically — agents stop when a limit is hit and warn at 80% consumption.

Quick Example

guardrails:
  max_tokens_per_run: 50000
  max_tool_calls: 20
  timeout_seconds: 300
  session_token_budget: 200000
  run_token_budget: 80000         # cumulative budget for one CLI invocation, including delegations (since v2026.5.1)
  # Team mode guardrails (kind: Team only)
  team_token_budget: 150000       # cumulative budget across all personas
  team_timeout_seconds: 900       # wall-clock limit for entire team run
  # Daemon resilience (since v2026.4.11)
  retry_policy:
    max_attempts: 3
    backoff_base_seconds: 2.0
    backoff_max_seconds: 30.0
  circuit_breaker:
    failure_threshold: 5
    reset_timeout_seconds: 60

Per-Run Limits

These limits apply to each individual agent run (a single invocation or trigger execution).

FieldTypeDefaultDescription
max_tokens_per_runint50000Maximum output tokens consumed per agent run
max_tool_callsint20Maximum tool invocations per run
timeout_secondsint300Wall-clock timeout per run (seconds)
max_request_limitint | nullautoMaximum LLM API round-trips per run. Auto-derived as max(max_tool_calls + 10, 30) when not set
input_tokens_limitint | nullnullPer-request input token limit
total_tokens_limitint | nullnullPer-request combined input+output token limit
run_token_budgetint | nullnullCumulative token budget for a single one-shot CLI run; counts the parent run plus completed inline-delegated sub-runs. Override per-invocation with --token-budget N. Since v2026.5.1.

The per-call limits (max_tokens_per_run, total_tokens_limit, input_tokens_limit, max_request_limit) map to PydanticAI's UsageLimits and bound a single LLM round-trip or a single top-level agent.run. They do not see tokens spent inside delegated sub-agents. run_token_budget is the cumulative cap across the whole invocation, including delegations. See run_token_budget semantics below.

run_token_budget semantics

run_token_budget is a cumulative-completed-run guard with best-effort hard-stop, not a live token meter. Available since v2026.5.1.

  • It is checked once before the parent run starts (so a previous over-budget invocation in the same process can short-circuit) and again before every inline delegate sub-run.
  • It records actual usage after the parent run and after each completed sub-run. PydanticAI only exposes per-agent.run usage when the run finishes.
  • It will stop a cascading delegate chain the moment the cumulative count crosses the cap.
  • It does not abort a single runaway parent mid-stream when the parent never delegates. In that case the per-call limits (max_tokens_per_run, total_tokens_limit) remain the relevant guard.

run_token_budget does not apply to --autonomous runs (use autonomous_token_budget for those) or to daemon mode (use the daemon_* budgets).

Session Budgets

guardrails:
  session_token_budget: 500000

session_token_budget tracks cumulative token usage across interactive REPL turns (-i mode). The agent warns at 80% consumption and stops accepting new prompts at 100%.

This is useful for long-running interactive sessions where you want to cap total spend.

Daemon Budgets

Daemon-mode agents (initrunner run --daemon) can have lifetime and daily budgets:

FieldTypeDefaultDescription
daemon_token_budgetint | nullnullLifetime token budget for the daemon process
daemon_daily_token_budgetint | nullnullDaily token budget, resets at midnight in budget_timezone
guardrails:
  daemon_token_budget: 1000000
  daemon_daily_token_budget: 100000

When a daemon budget is exhausted, triggers are skipped until the budget resets (daily) or the daemon is restarted (lifetime).

USD Cost Budgets

Daemon-mode agents can also enforce USD-based cost limits alongside token budgets. Cost is estimated per run using the genai-prices library.

FieldTypeDefaultDescription
daemon_daily_cost_budgetfloat | nullnullMaximum USD spend per calendar day
daemon_weekly_cost_budgetfloat | nullnullMaximum USD spend per ISO week
budget_timezonestr"UTC"IANA timezone for daily/weekly budget resets (e.g. "America/New_York")
guardrails:
  daemon_daily_cost_budget: 10.00
  daemon_weekly_cost_budget: 50.00
  budget_timezone: "America/New_York"   # resets at midnight Eastern

Daily cost resets at midnight in the configured budget_timezone (UTC by default). Weekly cost resets when the ISO week number changes. You can also override the timezone from the CLI:

initrunner run role.yaml --daemon --budget-timezone America/New_York

When a cost budget is exhausted, triggers are skipped just like token budgets. At startup, InitRunner validates that pricing data is available for the role's model. If genai-prices doesn't cover the model, the daemon exits with a clear error.

Budget counters are persisted to the audit database after each run. Restarting a daemon or bot restores the counters, so spend tracking survives process restarts (since v2026.4.11).

Token and cost budgets are enforced independently; either limit being hit will pause the daemon. See Cost Tracking for CLI analytics and dashboard UI.

Daemon Resilience

Since v2026.4.11, daemon-mode agents can retry failed runs and track provider health with a circuit breaker. Both features live under spec.guardrails.

Retry Policy

When a trigger fires and the agent run fails with a transient provider error (rate limit, 5xx, connection failure), the daemon retries the entire run with exponential backoff.

FieldTypeDefaultRangeDescription
retry_policy.max_attemptsint11-5Total attempts per trigger fire (1 = no retry)
retry_policy.backoff_base_secondsfloat2.00.5-30Base delay for exponential backoff
retry_policy.backoff_max_secondsfloat30.01-300Maximum backoff delay

Only transient provider errors are retried: HTTP 429 (rate limit), HTTP 5xx (server error), and connection failures. Timeouts, auth errors, content blocks, and usage limits are not retried.

Side effects: retries re-execute the entire agent run, including tool calls. Only enable retry for idempotent roles or when failures happen before tool execution (provider-level errors).

guardrails:
  retry_policy:
    max_attempts: 3
    backoff_base_seconds: 2.0
    backoff_max_seconds: 30.0

Circuit Breaker

The circuit breaker tracks provider health across trigger fires. After enough consecutive failures, it stops dispatching new runs until the provider recovers.

FieldTypeDefaultRangeDescription
circuit_breaker.failure_thresholdint51-100Consecutive failures before the circuit opens
circuit_breaker.reset_timeout_secondsint6010-3600Seconds before a half-open probe

State machine: CLOSED (normal) -> OPEN (all runs skipped) after hitting the failure threshold -> HALF_OPEN (one probe allowed) after the reset timeout -> back to CLOSED on success or OPEN again on failure.

Only provider-health errors trip the breaker: rate limits, server errors, connection failures, and auth errors (401/403). Application-level errors like content blocks and usage limits are ignored.

State transitions are logged as security audit events (circuit_open, circuit_half_open, circuit_closed).

guardrails:
  circuit_breaker:
    failure_threshold: 5
    reset_timeout_seconds: 60

Set circuit_breaker: null (the default) to disable.

Autonomous Limits

These fields control resource usage for autonomous mode runs:

FieldTypeDefaultDescription
max_iterationsint10Maximum plan-execute-adapt cycles
autonomous_token_budgetint | nullnullToken budget for the autonomous run
autonomous_timeout_secondsint | nullnullWall-clock timeout for the entire autonomous run
guardrails:
  max_iterations: 10
  autonomous_token_budget: 50000
  autonomous_timeout_seconds: 600

When any autonomous limit is hit, the agent stops and reports its progress via finish_task.

Team Budgets

These fields control resource usage for team mode runs (kind: Team):

FieldTypeDefaultDescription
team_token_budgetintnullCumulative token budget across all personas in a team run. Pipeline stops if exceeded. Team mode only.
team_timeout_secondsintnullWall-clock limit for entire team run. Pipeline stops if exceeded. Team mode only.
guardrails:
  team_token_budget: 150000
  team_timeout_seconds: 900

Team budgets protect team runs from unbounded spend across personas. Per-run limits (max_tokens_per_run, timeout_seconds) still apply to each individual persona. See Team Mode.

Enforcement Behavior

Each limit type has specific enforcement behavior:

LimitWhat Happens
max_tokens_per_runPydanticAI raises UsageLimitExceeded — the run stops immediately
max_tool_callsPydanticAI raises UsageLimitExceeded — the run stops immediately
timeout_secondsPython raises TimeoutError — the run is cancelled
max_request_limitPydanticAI raises UsageLimitExceeded — no more API round-trips
input_tokens_limitPydanticAI raises UsageLimitExceeded on the next request
total_tokens_limitPydanticAI raises UsageLimitExceeded on the next request
session_token_budgetWarns at 80%, stops accepting prompts at 100%
daemon_token_budgetTriggers are skipped when exhausted
daemon_daily_token_budgetTriggers are skipped until UTC midnight reset
daemon_daily_cost_budgetTriggers are skipped until midnight reset (in budget_timezone)
daemon_weekly_cost_budgetTriggers are skipped until ISO week rolls over (in budget_timezone)
retry_policyFailed run is retried with exponential backoff (transient errors only)
circuit_breakerAll trigger runs are skipped while circuit is open
max_iterationsAutonomous loop terminates, agent reports progress
autonomous_token_budgetAutonomous loop terminates, agent reports progress
autonomous_timeout_secondsAutonomous loop terminates, agent reports progress
team_token_budgetTeam pipeline stops, partial results returned
team_timeout_secondsTeam pipeline stops, partial results returned

Budget warnings apply to session_token_budget, daemon_token_budget, daemon_daily_token_budget, daemon_daily_cost_budget, and daemon_weekly_cost_budget. Warnings are logged at 80% and 95% consumption so operators can take action before the hard stop.

Visibility

Guardrail status is surfaced across multiple interfaces:

SurfaceWhat's Shown
initrunner validateWarns if guardrails are missing or misconfigured
REPL subtitleLive token usage and remaining budget
Dashboard status barPer-run and session budget consumption bars
Dashboard API/api/agents/:id/usage endpoint returns current budget state
Audit logsEvery limit hit is recorded with the limit name and value

Tool Output Limits

Individual tool outputs are capped to prevent a single response from consuming the entire context window:

ToolMax Output SizeBehavior When Exceeded
read_file1 MBOutput is truncated with a [truncated] marker
http_request100 KBResponse body is truncated; headers are preserved
shell100 KBstdout/stderr combined output is truncated
search_documents50 KBResults are truncated; match count is still reported

These limits are not configurable — they are hard-coded safety rails to protect context window budget. If you need larger outputs, read files in chunks or paginate HTTP responses.

Example Configurations

Cost-Conscious Development

Tight limits for iterative development where you want fast feedback and low spend:

guardrails:
  max_tokens_per_run: 10000
  max_tool_calls: 10
  timeout_seconds: 60
  session_token_budget: 50000

Production Daemon

A daemon role with daily budgets and autonomous limits:

apiVersion: initrunner/v1
kind: Agent
metadata:
  name: monitor-agent
  description: Monitors infrastructure and auto-remediates issues
spec:
  role: |
    You are an infrastructure monitor. Check system health when triggered,
    diagnose issues, and apply standard remediations.
  model:
    provider: openai
    name: gpt-4o-mini
    temperature: 0.0
  tools:
    - type: shell
      allowed_commands: [curl, systemctl, journalctl]
      require_confirmation: false
      timeout_seconds: 30
  triggers:
    - type: cron
      schedule: "*/5 * * * *"
      prompt: "Run a health check on all services."
      autonomous: true
  autonomy:
    max_plan_steps: 8
    max_history_messages: 20
    iteration_delay_seconds: 2
  guardrails:
    # Per-run limits
    max_tokens_per_run: 15000
    max_tool_calls: 10
    timeout_seconds: 120
    # Daemon budgets
    daemon_token_budget: 5000000
    daemon_daily_token_budget: 500000
    # Cost budgets
    daemon_daily_cost_budget: 10.00
    daemon_weekly_cost_budget: 50.00
    budget_timezone: "UTC"
    # Daemon resilience
    retry_policy:
      max_attempts: 3
      backoff_base_seconds: 2.0
      backoff_max_seconds: 30.0
    circuit_breaker:
      failure_threshold: 5
      reset_timeout_seconds: 60
    # Autonomous limits
    max_iterations: 5
    autonomous_token_budget: 30000
    autonomous_timeout_seconds: 300

RAG with Budget

A knowledge-base agent with session budgets to cap interactive usage:

guardrails:
  max_tokens_per_run: 30000
  max_tool_calls: 15
  timeout_seconds: 180
  session_token_budget: 200000
  input_tokens_limit: 16000

CLI Overrides

# Override max iterations for autonomous mode
initrunner run role.yaml -a --max-iterations 5

# Override the per-run cumulative token budget for one invocation
initrunner run role.yaml --token-budget 80000

The --max-iterations N flag overrides the max_iterations value from the YAML file for that run. The --token-budget N flag (since v2026.5.1) overrides guardrails.run_token_budget for that invocation; it caps the parent run plus any inline-delegated sub-agents.

On this page