Autonomous Mode

Autonomous mode lets an agent plan its own work, execute steps, adapt when things go wrong, and signal completion — all without human input. It's enabled by the spec.autonomy section and the -a CLI flag.

Reasoning strategies (react, todo_driven, plan_execute, reflexion) orchestrate agent behavior across autonomous turns. See Reasoning Primitives for the full guide.

How It Works

An autonomous agent follows a plan-execute-adapt loop:

Plan — The agent creates a structured todo list using the todo tool
Execute — It works through each step using its tools
Adapt — If a step fails, the agent modifies its plan (add retries, skip, investigate)
Finish — The agent calls finish_task — or the loop auto-completes when all todo items reach terminal status

The finish_task tool is auto-registered when autonomy is enabled. Task tracking comes from the todo tool — add type: todo to spec.tools for structured planning:

Tool	Source	Description
`finish_task(status, summary)`	Auto-registered	Signal task completion with an overall status and summary
`add_todo`, `batch_add_todos`, `update_todo`, ...	`type: todo` tool	Structured task management. See Tools — Todo

Loop Mechanics

Each autonomous run follows a precise iteration sequence:

Iteration 1 — The agent receives the user prompt plus the system prompt. It creates its initial todo list, then begins executing the first step.
Iterations 2+ — The continuation_prompt is injected with the current ReflectionState (todo progress, completed items, failures). The active reasoning strategy shapes these continuation prompts. The agent continues executing, adapting, or re-planning.
Budget visibility — Each continuation prompt includes a BUDGET block showing remaining resources:
```
BUDGET:
- Iteration: 4/10 (40%)
- Tokens: 18,200/30,000 (61%)
- Time: 142s/300s (47%)
```
This gives the agent awareness of its resource constraints at every turn, enabling it to prioritize critical tasks and call finish_task before hitting a hard limit. Fields are omitted when no budget is configured for that dimension.
History trimming and compaction — When conversation messages exceed max_history_messages, the oldest messages are dropped (keeping the system prompt and the most recent messages). Alternatively, enable history compaction to LLM-summarize old messages before trimming, preserving key context. This prevents context window exhaustion on long runs.
Budget check — Before each iteration, the runner checks autonomous_token_budget, max_iterations, and autonomous_timeout_seconds. If any limit is reached, the loop terminates.
Terminal conditions — The loop ends when:
- The agent calls finish_task (status: completed)
- Any guardrail limit is hit (status: max_iterations, budget_exceeded, or timeout)
- The agent reports it is stuck (status: blocked or failed)
- An unrecoverable error occurs (status: error)
Rate limiting — If iteration_delay_seconds is set (> 0), the runner sleeps between iterations to avoid API rate limits.
Result — The final ReflectionState is returned with the terminal status, the plan steps (with their statuses), and the agent's summary.

Example: Deployment Checker

A complete autonomous agent that verifies deployments:

apiVersion: initrunner/v1
kind: Agent
metadata:
  name: deployment-checker
  description: Autonomous deployment verification agent
  tags: [devops, autonomous, deployment]
spec:
  role: |
    You are a deployment verification agent. When given one or more URLs to check,
    create a todo list with one item per URL, execute each check, and produce a
    pass/fail report.

    Workflow:
    1. Use batch_add_todos to create a checklist — one item per URL to verify
    2. Use get_next_todo to pick the next item
    3. Run curl -sSL -o /dev/null -w "%{http_code} %{time_total}s" for each URL
    4. Mark each item completed (2xx) or failed (anything else) via update_todo
    5. If a check fails, add a retry item with add_todo
    6. When done, send a Slack summary with pass/fail results per URL
    7. Call finish_task with the overall status
  model:
    provider: openai
    name: gpt-5-mini
    temperature: 0.0
  tools:
    - type: think
    - type: todo
      max_items: 12
    - type: shell
      allowed_commands:
        - curl
      require_confirmation: false
      timeout_seconds: 30
    - type: slack
      webhook_url: "${SLACK_WEBHOOK_URL}"
      default_channel: "#deployments"
      username: Deploy Checker
      icon_emoji: ":white_check_mark:"
  reasoning:
    pattern: todo_driven
    auto_plan: true
  autonomy:
    max_plan_steps: 12
    max_history_messages: 20
    iteration_delay_seconds: 1
    max_scheduled_per_run: 1
  guardrails:
    max_iterations: 6
    autonomous_token_budget: 30000
    max_tokens_per_run: 10000
    max_tool_calls: 15
    session_token_budget: 100000

initrunner run deployment-checker.yaml -a \
  -p "Verify https://api.example.com/health and https://api.example.com/ready"

Configuration

The spec.autonomy section controls planning behavior:

Field	Type	Default	Description
`max_plan_steps`	`int`	`20`	Maximum steps allowed in a plan
`max_history_messages`	`int`	`40`	Messages kept in context during iteration
`iteration_delay_seconds`	`int`	`0`	Pause between iterations (prevents tight loops)
`continuation_prompt`	`str`	`"Continue working on the task..."`	Prompt injected at each iteration to keep the agent on track
`max_scheduled_per_run`	`int`	`3`	Maximum follow-up tasks scheduled per autonomous run
`max_scheduled_total`	`int`	`50`	Maximum total scheduled tasks across all runs
`max_schedule_delay_seconds`	`int`	`86400`	Maximum delay allowed when scheduling a follow-up (seconds)
`compaction.enabled`	`bool`	`false`	Enable LLM-driven summarization of old messages before trimming
`compaction.threshold`	`int`	`30`	Minimum message count before compaction activates
`compaction.tail_messages`	`int`	`6`	Number of recent messages to keep verbatim (not summarized)
`compaction.model_override`	`str \| null`	`null`	Model to use for summarization. Defaults to the role's model
`compaction.summary_prefix`	`str`	`"[CONVERSATION HISTORY SUMMARY]\n"`	Prefix prepended to the LLM summary

History Compaction

Long-running autonomous agents can lose important context when older messages are dropped by simple history trimming. History compaction solves this by using an LLM call to summarize older messages before they are trimmed, preserving key decisions, tool results, and open tasks.

Configuration

spec:
  autonomy:
    compaction:
      enabled: true
      threshold: 30
      tail_messages: 6
      model_override: "openai:gpt-4o-mini"
      summary_prefix: "[CONVERSATION HISTORY SUMMARY]\n"

How It Works

After each iteration, if compaction.enabled is true and the conversation history exceeds compaction.threshold messages:

The most recent tail_messages messages are set aside (kept verbatim).
All older messages (except the first message, which is always preserved) are sent to an LLM for summarization.
The summary replaces the old messages as a single message, prefixed with summary_prefix.
Normal history trimming (max_history_messages) runs after compaction.

Behavior

Fail-open — if the summarization LLM call fails, the original history is kept and trimming proceeds normally. Errors are logged but never crash the loop.
Threshold-based — compaction only activates when message count exceeds threshold, avoiding unnecessary LLM calls on short runs.
Tail preservation — the tail_messages most recent messages are never summarized, ensuring the agent always has full fidelity on its latest actions.
Model flexibility — use model_override to route summarization to a cheaper or faster model (e.g. gpt-4o-mini) to save tokens on the primary model.

See the long-running-analyst example for a complete configuration using compaction.

Guardrails

Autonomous agents need spending limits since they run without human oversight. These fields in spec.guardrails control resource usage:

Field	Type	Default	Scope	Description
`max_iterations`	`int`	`10`	per-run	Maximum plan-execute-adapt cycles
`autonomous_token_budget`	`int \| null`	`null`	per-run	Token budget for the autonomous run
`autonomous_timeout_seconds`	`int \| null`	`null`	per-run	Wall-clock timeout for the entire autonomous run
`max_tokens_per_run`	`int`	`50000`	per-iteration	Maximum output tokens consumed per iteration
`max_tool_calls`	`int`	`20`	per-iteration	Maximum tool invocations per iteration
`timeout_seconds`	`int`	`300`	per-iteration	Wall-clock timeout per iteration
`max_request_limit`	`int \| null`	`auto`	per-iteration	Maximum LLM API round-trips per iteration. Auto-derived as `max(max_tool_calls + 10, 30)`
`session_token_budget`	`int \| null`	`null`	session	Cumulative token budget for REPL session
`daemon_token_budget`	`int \| null`	`null`	daemon	Lifetime token budget for the daemon process
`daemon_daily_token_budget`	`int \| null`	`null`	daemon	Daily token budget — resets at UTC midnight
`max_scheduled_per_run`	`int`	`3`	scheduling	Maximum follow-up tasks scheduled per autonomous run
`max_scheduled_total`	`int`	`50`	scheduling	Maximum total scheduled tasks across all runs

When any limit is hit, the agent stops and reports its progress. See Guardrails for full enforcement behavior, daemon budgets, and all available limits.

Scheduling Tools

When autonomy is combined with daemon mode, two additional tools are auto-registered for scheduling follow-up tasks:

Tool	Description
`schedule_followup(prompt, delay_seconds)`	Schedule a follow-up task to run after a delay (in seconds)
`schedule_followup_at(prompt, iso_datetime)`	Schedule a follow-up task at a specific ISO 8601 datetime

Both tools are limited by max_scheduled_per_run and max_scheduled_total from the autonomy config. Scheduled follow-ups always run in autonomous mode.

Note: Scheduled tasks are in-memory only and are lost on daemon restart.

autonomy:
  max_scheduled_per_run: 3
  max_scheduled_total: 50
  max_schedule_delay_seconds: 86400  # max 24 hours

Autopilot Mode

Autopilot is daemon mode where every trigger runs the full autonomous loop instead of single-shot execution. One flag turns it on:

initrunner run role.yaml --autopilot

A daemon responds. An autopilot thinks, then responds. Someone messages your Telegram bot "find me flights from NYC to London next week." In daemon mode, you get one shot at an answer. In autopilot, the agent searches the web, compares options, checks dates, and sends back something worth reading.

All trigger types support this, including Telegram and Discord.

Per-Trigger Configuration

If you only want autonomous execution on specific triggers, set autonomous: true per trigger instead of using --autopilot globally:

spec:
  triggers:
    - type: cron
      schedule: "0 */6 * * *"
      prompt: "Check system health and remediate issues."
      autonomous: true
    - type: telegram
      token_env: TELEGRAM_BOT_TOKEN
      allowed_users: ["alice"]
      autonomous: true          # full autonomous loop per message
    - type: file_watch
      paths: ["./reports"]
      extensions: [".csv"]
      prompt_template: "Process new report: {path}"
      # autonomous: false (default) -- quick single response

Daemon vs Autopilot

	`--daemon`	`--autopilot`
Triggers fire	Yes	Yes
Autonomous execution	Only where `autonomous: true` is set	All triggers
Telegram/Discord support	Single-shot unless `autonomous: true`	Full autonomous loop
Guardrails apply	Yes	Yes
Scheduling tools	When `spec.autonomy` is configured	When `spec.autonomy` is configured

Guardrails

All existing guardrails apply in autopilot mode: max_iterations, autonomous_token_budget, autonomous_timeout_seconds, max_tool_calls, daemon_token_budget, and daemon_daily_token_budget. The agent stops and reports progress if any limit is hit. See Guardrails for the full list.

Scheduled follow-ups (via schedule_followup / schedule_followup_at) always run in autonomous mode regardless of per-trigger config.

CLI Flags

Flag	Description
`-a`, `--autonomous`	Enable autonomous mode for this run
`--autopilot`	Daemon mode with all triggers autonomous
`--max-iterations N`	Override `max_iterations` from the YAML

# Enable autonomous mode
initrunner run role.yaml -a -p "Check all endpoints"

# Autopilot -- all triggers use the autonomous loop
initrunner run role.yaml --autopilot

# Override max iterations
initrunner run role.yaml -a --max-iterations 3 -p "Quick check"

Reflection State

At each iteration, the agent's current state is captured as a ReflectionState and injected into the continuation prompt. This gives the agent awareness of what it has accomplished and what remains.

ReflectionState contains:

Field	Type	Description
`completed`	`bool`	Whether the agent has called `finish_task`
`summary`	`str`	Running summary of progress
`status`	`str`	Current status label
`todo_list`	`TodoList`	The current todo list tracking task progress

Each todo item has description, status, priority, notes, and depends_on fields. See Tools — Todo for the full status and priority reference.

The reflection state — including the formatted todo list — is rendered and appended to the continuation_prompt at the start of each iteration. The active reasoning strategy may customize how this state is presented.

Memory Integration

Autonomous mode integrates with the Memory system for persistence and recall:

Session save (--resume) — When memory is configured and the agent is run with --resume, the conversation history (including plan steps and tool outputs) is saved at the end of the run. The next --resume invocation restores context so the agent can pick up where it left off.
finish_task episodic capture — When the agent calls finish_task, the summary is persisted as an episodic memory with category autonomous_run (if episodic memory is enabled). This allows future runs or other agents to recall past outcomes.
recall tool — If memory is enabled, the recall tool is auto-registered. The agent can search all memory types (semantic, episodic, procedural) for past results, patterns, and decisions. Pass memory_types to filter by type. This is useful for agents that run repeatedly (e.g., via cron triggers) and need to avoid repeating past work.
Consolidation on exit — When consolidation.interval is after_autonomous, consolidation runs automatically after the autonomous loop exits, extracting durable semantic facts from episodic records. See Memory: Consolidation.

Terminal Statuses

When an autonomous run ends, it produces a final_status indicating how it concluded:

Status	Description	Success?
`completed`	Agent called `finish_task` successfully	Yes
`max_iterations`	Reached the `max_iterations` limit	Yes
`blocked`	Agent is stuck and cannot proceed	No
`failed`	Agent encountered a failure it couldn't recover from	No
`budget_exceeded`	Token budget exhausted	No
`timeout`	`autonomous_timeout_seconds` elapsed	No
`error`	Unexpected error during execution	No

completed and max_iterations are considered successful outcomes. All others indicate the run did not finish its intended work.

When to Use Autonomous Mode

Good fit:

Verification tasks (deployment checks, health audits)
Batch processing (process a list of items with per-item steps)
Multi-step investigations (diagnose an issue, try fixes)
Tasks with clear completion criteria

Consider alternatives:

Recurring tasks → use Triggers with daemon mode instead
Multi-agent workflows → use Flow for coordination
Interactive exploration → use REPL mode (-i) for human-in-the-loop

Troubleshooting

Agent never calls `finish_task`

Cause: The system prompt doesn't instruct the agent to call finish_task, or the agent gets stuck in an adapt loop creating new steps indefinitely.

Fix: Explicitly instruct the agent to call finish_task in spec.role. Set max_iterations and max_plan_steps to enforce hard stops. The max_iterations terminal status is still considered a successful outcome.

Token budget exceeded

Cause: The autonomous token budget is too small for the task, or the agent is producing verbose tool outputs that consume tokens quickly.

Fix: Increase autonomous_token_budget or reduce per-iteration output by lowering model.max_tokens. Check if shell or HTTP tools are returning large outputs — tool output limits (see Guardrails) apply automatically, but the agent may be making too many calls. Reduce max_tool_calls to limit per-iteration tool usage.

Scheduled tasks lost on daemon restart

Cause: Scheduled follow-ups (via schedule_followup / schedule_followup_at) are stored in-memory only. When the daemon process restarts, all pending scheduled tasks are lost.

Fix: Use cron triggers for recurring tasks instead of schedule_followup. For critical follow-ups, have the agent write the schedule to a file or external system (e.g., a database) and use a cron trigger to check for pending work.

Agent makes no tool calls

Cause: The model is responding with text-only messages instead of invoking tools. This typically happens when the system prompt is too vague, or when max_tool_calls is set to 0.

Fix: Verify max_tool_calls is greater than 0. Make the system prompt explicit about which tools to use and when. Add example workflows in spec.role that reference tool names directly.

On this page