Reasoning Primitives
InitRunner's reasoning system gives agents structured cognitive tools and execution strategies. Two orthogonal layers compose naturally: cognitive tools (think, todo, spawn) that the LLM uses voluntarily within a turn, and reasoning strategies (react, todo_driven, plan_execute, reflexion) that orchestrate behavior across turns in Autonomous Mode.
All reasoning tools are run-scoped — built fresh per-run with isolated state, never leaking across REPL/daemon sessions.
Quick Start
Minimal autonomous agent with structured reasoning:
apiVersion: initrunner/v1
kind: Agent
metadata:
name: planner
description: Autonomous planner with structured reasoning
spec:
role: |
You are a senior project planner. Break tasks into structured
todo lists, research each item, and synthesize findings.
model:
provider: openai
name: gpt-5.4-mini-2026-03-17
tools:
- type: think
critique: true
- type: todo
max_items: 20
- type: search
provider: duckduckgo
reasoning:
pattern: todo_driven
auto_plan: true
autonomy:
max_plan_steps: 20
guardrails:
max_iterations: 15
autonomous_token_budget: 100000Run it:
initrunner run planner.yaml -a -p "Research the top 3 Python web frameworks and compare them."The agent will:
- Create a structured todo list (batch_add_todos)
- Work through each item (get_next_todo, update_todo)
- Auto-complete when all items reach terminal status
Think Tool
Gives the agent a scratchpad that accumulates reasoning as a numbered chain. Unlike a plain "thought recorded" response, the agent sees its full reasoning history on every call — surviving context trimming.
Configuration
tools:
- type: think
critique: true # nudge self-critique every 5th thought
max_thoughts: 30 # ring buffer capacity (default: 50)| Field | Type | Default | Description |
|---|---|---|---|
critique | bool | false | Append self-critique nudge every 5th thought |
max_thoughts | int | 50 | Ring buffer capacity (1-200) |
How it works
Each think(thought) call appends the thought and returns the full numbered chain:
Thoughts (3):
1. The user wants a CLI tool, so startup time matters
2. Python is faster to develop but Rust compiles to a single binary
3. Distribution is the key differentiator hereWith critique: true, every 5th thought appends:
You have recorded 5 thoughts. Before proceeding, critically evaluate your reasoning. What assumptions might be wrong? What have you missed?
The ring buffer evicts the oldest thought when full, bounding token overhead to ~3500 tokens at 50 thoughts.
When to use
- Always add
type: thinkfor agents doing multi-step reasoning - Enable critique for complex tasks where self-correction matters
- Reduce max_thoughts for agents with tight token budgets
Modes
The think tool works in both single-shot and autonomous mode. In single-shot, the agent can call it multiple times within one run. In autonomous mode, thoughts persist across iterations through the run-scoped state.
Todo Tool
Priority-aware task management with dependency resolution. Operates on the agent's unified ReflectionState, giving a single source of truth for progress.
Configuration
tools:
- type: todo
max_items: 30 # max concurrent items (default: 30)
shared: false # sub-agent visibility (default: false)
shared_path: "" # SQLite path (required when shared: true)| Field | Type | Default | Description |
|---|---|---|---|
max_items | int | 30 | Maximum concurrent items (1-100) |
shared | bool | false | Back state with SQLite for sub-agent access |
shared_path | str | "" | SQLite file path (required when shared: true) |
Tool functions exposed to the LLM
| Tool | Description |
|---|---|
add_todo(description, priority?, depends_on?) | Create an item. Returns its 8-char ID + the full formatted list. |
batch_add_todos(items) | Create multiple items at once. Supports inter-batch dependency refs via index ("0", "1", ...). |
update_todo(id, status?, notes?, priority?) | Update fields on an existing item. Returns the full formatted list. |
remove_todo(id) | Remove an item and clean up dangling dependency references. |
list_todos(status_filter?) | Show all items, or filter by status. |
get_next_todo() | Return the highest-priority pending item whose dependencies are all in terminal status. |
finish_task(summary, status) | Explicitly signal task completion (completed/blocked/failed). |
Statuses
| Status | Terminal? | Icon | Description |
|---|---|---|---|
pending | No | [ ] | Not started |
in_progress | No | [>] | Currently being worked on |
completed | Yes | [x] | Successfully finished |
failed | Yes | [!] | Failed |
skipped | Yes | [-] | Intentionally skipped |
Priority ordering
critical > high > medium > low. get_next_todo() returns the highest-priority pending item whose dependencies are all in terminal status.
Dependencies
Items can depend on other items by ID. The agent specifies dependencies when creating items:
add_todo("Deploy to staging", priority="high", depends_on=["abc12345"])Or in batch, using 0-based batch indices:
batch_add_todos([
{"description": "Write tests", "priority": "high"},
{"description": "Run tests", "depends_on": ["0"]},
{"description": "Deploy", "depends_on": ["1"]}
])Cycles are detected via Kahn's algorithm and rejected immediately. When an item is removed, dangling dependency references in other items are cleaned up.
Auto-completion
When every item in the list reaches a terminal status (completed, failed, or skipped), the autonomous loop automatically signals completion. The agent does not need to call finish_task explicitly — though it can do so at any time to override.
Shared mode
When shared: true, the todo list is backed by SQLite with WAL mode for concurrent access. Sub-agents spawned via the delegate or spawn tool can read and update the same list.
tools:
- type: todo
shared: true
shared_path: ./.initrunner/shared_todo.dbSpawn Tool
Non-blocking parallel agent execution. Spawn sub-agents as background tasks, poll for results, and await completion — all within a single agent run.
Configuration
tools:
- type: spawn
max_concurrent: 3 # parallel task limit (default: 4, max: 16)
timeout_seconds: 120 # per-task timeout (default: 300)
agents:
- name: researcher
role_file: ./agents/researcher.yaml
description: Researches a specific topic
- name: coder
role_file: ./agents/coder.yaml
description: Writes and reviews code
shared_memory: # optional shared memory for sub-agents
store_path: ./.initrunner/shared.db
max_memories: 1000| Field | Type | Default | Description |
|---|---|---|---|
agents | list | required | Agent refs with name, role_file or url, and description |
max_concurrent | int | 4 | Maximum parallel tasks (1-16) |
max_depth | int | 3 | Maximum delegation depth |
timeout_seconds | int | 300 | Per-task wall-clock timeout |
shared_memory | object | null | Shared LanceDB memory config |
Each agent ref needs either role_file (inline execution via InlineInvoker) or url (remote execution via McpInvoker).
Tool functions exposed to the LLM
| Tool | Description |
|---|---|
spawn_agent(agent_name, prompt) | Submit a background task. Returns immediately with a task_id. |
poll_tasks(task_ids?) | Check status of specific tasks or all. Returns a formatted status table. |
await_tasks(task_ids) | Block until all specified tasks complete. Returns their results. |
await_any(task_ids) | Block until any one task completes. Returns its result. |
cancel_task(task_id) | Cancel a running background task. |
How it works
The spawn pool maintains a private asyncio event loop in a daemon thread. When the agent calls spawn_agent, the task is submitted via asyncio.run_coroutine_threadsafe(). The underlying invokers (InlineInvoker for local agents, McpInvoker for remote) run via asyncio.to_thread().
Task statuses: running, completed, failed, timeout.
The pool is cleaned up when the run ends — remaining tasks are cancelled and the event loop is stopped.
Typical usage pattern
1. spawn_agent("researcher", "Find stats on Python adoption") -> task_a
2. spawn_agent("researcher", "Find stats on Rust adoption") -> task_b
3. await_tasks([task_a, task_b]) -> results
4. Synthesize results into final answerReasoning Strategies
The spec.reasoning config controls how the autonomous runner orchestrates agent behavior across turns. Strategies operate in Autonomous Mode only (-a flag).
Configuration
spec:
reasoning:
pattern: todo_driven # react | todo_driven | plan_execute | reflexion
auto_plan: true # prepend planning instructions to first turn
reflection_rounds: 0 # post-completion self-critique rounds (reflexion only)
auto_detect: true # infer pattern from tool/autonomy config| Field | Type | Default | Description |
|---|---|---|---|
pattern | string | "react" | Reasoning pattern to use |
auto_plan | bool | false | Prepend "create a todo list" to first turn |
reflection_rounds | int | 0 | Number of self-critique rounds after completion |
reflection_dimensions | list | null | Custom dimensions for reflexion self-critique (overrides defaults) |
auto_detect | bool | true | Infer pattern from tool/autonomy config |
Patterns
react (default)
Standard ReAct loop. The LLM decides when and how to use tools. No extra orchestration from the runner. This is the pattern every agent uses today.
reasoning:
pattern: reacttodo_driven
Plan-first execution. The runner prepends instructions to create a structured todo list on the first turn. Continuation prompts guide the agent: "Check your todo list. Get the next item and work on it."
Requires a todo tool in spec.tools.
tools:
- type: todo
reasoning:
pattern: todo_driven
auto_plan: true # recommendedHow it works:
- First turn: prompt is prefixed with "Before starting, create a structured todo list..."
- Subsequent turns: "Check your todo list. Call get_next_todo..."
- Loop exits when all items reach terminal status (auto-completion) or the agent calls
finish_task
plan_execute
Two-phase execution. Phase 1 (planning): the agent creates a comprehensive plan without executing. Phase 2 (execution): the agent works through plan items. The agent explicitly calls finalize_plan() to transition from planning to execution.
Requires a todo tool in spec.tools.
tools:
- type: todo
reasoning:
pattern: plan_executeHow it works:
- First turn: "PHASE 1 - PLANNING: Analyze this task and create a comprehensive todo list. Focus only on planning. Do not execute yet."
- Planning continues until the agent calls
finalize_plan()to signal the plan is complete. The tool rejects empty plans. - Phase transition: "PHASE 2 - EXECUTION: Work through your plan."
- Execution continues until auto-completion or
finish_task
The finalize_plan() tool is auto-registered when plan_execute is the active pattern. It takes no arguments and returns a confirmation that the phase has transitioned.
reflexion
Post-completion self-critique with dimension-specific evaluation. After the agent finishes (calls finish_task or todo auto-completes), the runner re-opens the state and injects structured critique prompts targeting specific quality dimensions.
Requires reflection_rounds > 0.
reasoning:
pattern: reflexion
reflection_rounds: 3 # one round per dimensionHow it works:
- Agent works normally until completion
- Each reflection round focuses on a specific dimension. By default, the three dimensions are correctness, completeness, and clarity, each with a structured evaluation rubric. Example prompt: "REFLECTION (1/3) — CORRECTNESS: Check for factual errors, logical flaws, and incorrect assumptions in your output."
- Agent gets
reflection_roundsadditional turns to self-correct against each dimension - Final output is from the last iteration
Reflexion composes with other patterns. A todo_driven agent with reflection_rounds: 1 first completes its todo list, then gets one critique pass.
Configuring reflection dimensions
Override the default dimensions with custom evaluation criteria:
reasoning:
pattern: reflexion
reflection_rounds: 3
reflection_dimensions:
- name: correctness
prompt: "Check for factual errors, logical flaws, and incorrect assumptions."
- name: completeness
prompt: "Are there missing sections, gaps in coverage, or unanswered questions?"
- name: clarity
prompt: "Is the structure logical and easy to follow? Is the language clear?"When reflection_dimensions is set, each round uses the corresponding dimension's prompt. If reflection_rounds exceeds the number of dimensions, extra rounds cycle back through the list.
Auto-detection
When auto_detect: true (the default) and no explicit pattern is set:
| Condition | Detected pattern |
|---|---|
Has todo tool + spec.autonomy configured | todo_driven |
Has reflection_rounds > 0 | reflexion |
| Everything else | react |
Explicit pattern setting always overrides auto-detection.
Validation
The loader validates reasoning config at build time:
todo_drivenorplan_executewithout atodotool raisesRoleLoadErrorreflexionwithreflection_rounds == 0raisesRoleLoadError
Zero-Config Examples
You don't need to set spec.reasoning explicitly. Auto-detection picks the right pattern:
Minimal todo agent (auto-detects todo_driven)
apiVersion: initrunner/v1
kind: Agent
metadata:
name: task-agent
description: Agent with structured task tracking
spec:
role: You are a helpful assistant that plans work carefully.
model:
provider: openai
name: gpt-5.4-mini-2026-03-17
tools:
- type: think
- type: todo
autonomy:
max_plan_steps: 15
guardrails:
max_iterations: 10
autonomous_token_budget: 50000initrunner run task-agent.yaml -a -p "Summarize the key differences between REST and GraphQL"Single-shot with think (auto-detects react)
apiVersion: initrunner/v1
kind: Agent
metadata:
name: reasoner
description: Agent that thinks before answering
spec:
role: |
You are a careful analyst. Always use the think tool to reason
step by step before giving your answer.
model:
provider: openai
name: gpt-5.4-mini-2026-03-17
tools:
- type: think
critique: trueinitrunner run reasoner.yaml -p "Should we migrate from REST to GraphQL?"Composing Primitives
The Tools compose naturally through LLM reasoning. No special wiring needed.
think + todo (structured reasoning)
The agent uses think to reason about each todo item before working on it:
tools:
- type: think
critique: true
- type: todo
reasoning:
pattern: todo_driven
auto_plan: truetodo + spawn (parallel research)
The agent creates a todo list, spawns background agents for parallelizable items, awaits results, then updates statuses:
tools:
- type: todo
- type: spawn
agents:
- name: researcher
role_file: ./agents/researcher.yaml
reasoning:
pattern: todo_driven
auto_plan: truetodo + reflexion (self-correcting planner)
The agent plans and executes via todo_driven, then gets one round of self-critique:
tools:
- type: todo
- type: think
critique: true
reasoning:
pattern: todo_driven
auto_plan: true
reflection_rounds: 1Run-Scoped Tool Architecture
Reasoning tools carry per-run state (thought chains, todo lists, spawn pools). Standard tools are built once at agent-build time and reused across runs. Run-scoped tools are different: they are built fresh for each run with isolated state, preventing leaks across REPL/daemon sessions.
How it works
- Tool author marks a tool as run-scoped in the registration decorator:
@register_tool("todo", TodoToolConfig, run_scoped=True)
def build_todo_toolset(config, ctx, state):
...build_toolsets()automatically skips run-scoped tools during agent construction- The runner calls
build_run_scoped_toolsets()at the start of each run to construct them with fresh state - Run-scoped toolsets are passed as
extra_toolsetstoexecute_run()
Creating custom run-scoped tools
If you're building a custom tool that needs per-run state:
from initrunner.agent.tools._registry import register_tool, ToolBuildContext
from initrunner.agent.schema.tools import ToolConfigBase
from pydantic_ai.toolsets.function import FunctionToolset
class MyStatefulConfig(ToolConfigBase):
type: Literal["my_stateful"] = "my_stateful"
@register_tool("my_stateful", MyStatefulConfig, run_scoped=True)
def build_my_toolset(config, ctx):
state = [] # fresh per-run
toolset = FunctionToolset()
@toolset.tool_plain
def record(value: str) -> str:
state.append(value)
return f"Recorded {len(state)} values."
return toolsetFull Example: Autonomous Research Lead
apiVersion: initrunner/v1
kind: Agent
metadata:
name: research-lead
description: Autonomous research lead with parallel workers and self-critique
spec:
role: |
You are a research lead. Given a topic:
1. Break it into research questions (todo list)
2. Spawn researchers for parallelizable questions
3. Synthesize findings into a structured report
4. Self-critique before finalizing
model:
provider: openai
name: gpt-5.4-mini-2026-03-17
tools:
- type: think
critique: true
- type: todo
max_items: 15
- type: spawn
max_concurrent: 3
agents:
- name: web-researcher
role_file: ./agents/web-researcher.yaml
description: Searches the web and summarizes findings
- name: data-analyst
role_file: ./agents/data-analyst.yaml
description: Analyzes data and produces charts
- type: filesystem
root_path: ./output
read_only: false
reasoning:
pattern: todo_driven
auto_plan: true
reflection_rounds: 1
autonomy:
max_plan_steps: 20
guardrails:
max_iterations: 20
autonomous_token_budget: 150000
timeout_seconds: 600initrunner run research-lead.yaml -a -p "Compare the top 3 vector databases for production RAG systems"