InitRunner

Providers

The default model is anthropic/claude-sonnet-4-6. You can switch to any supported provider, a local Ollama instance, or a custom OpenAI-compatible endpoint by changing the spec.model block in your role YAML.

Standard Providers

Change provider and name, then install the matching extra if needed:

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
ProviderEnv VarExtra to installExample model
anthropicANTHROPIC_API_KEYinitrunner[anthropic]claude-sonnet-4-6
openaiOPENAI_API_KEY(included)gpt-5-mini
googleGOOGLE_API_KEYinitrunner[google]gemini-2.5-flash
groqGROQ_API_KEYinitrunner[groq]llama-4-scout-17b-16e
mistralMISTRAL_API_KEYinitrunner[mistral]mistral-large-latest
cohereCO_API_KEYinitrunner[all-models]command-a
bedrockAWS_ACCESS_KEY_IDinitrunner[all-models]us.anthropic.claude-sonnet-4-6-v1:0
xaiXAI_API_KEYinitrunner[all-models]grok-4

Install all provider extras at once with pip install initrunner[all-models].

Provider Snippets

Anthropic (pip install initrunner[anthropic]):

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6

OpenAI (no extra required):

spec:
  model:
    provider: openai
    name: gpt-5-mini

Google (pip install initrunner[google]):

spec:
  model:
    provider: google
    name: gemini-2.5-flash

Groq (pip install initrunner[groq]):

spec:
  model:
    provider: groq
    name: llama-4-scout-17b-16e

Mistral (pip install initrunner[mistral]):

spec:
  model:
    provider: mistral
    name: mistral-large-latest

Cohere (pip install initrunner[all-models]):

spec:
  model:
    provider: cohere
    name: command-a

Bedrock (pip install initrunner[all-models]):

spec:
  model:
    provider: bedrock
    name: us.anthropic.claude-sonnet-4-6-v1:0

xAI (pip install initrunner[all-models]):

spec:
  model:
    provider: xai
    name: grok-4

CLI Provider Switching

Instead of editing YAML, you can switch providers with the configure command:

# Interactive: pick provider and model from menus
initrunner configure role.yaml

# Non-interactive
initrunner configure role.yaml --provider anthropic --model claude-sonnet-4-6

# Configure an installed role by name
initrunner configure code-reviewer --provider groq

# Revert to the original provider/model
initrunner configure code-reviewer --reset

For installed roles (from InitHub or OCI), overrides are stored in registry.json so the original YAML stays pristine. Overrides survive hub updates and reinstalls.

Post-install adaptation: After initrunner install, the CLI checks whether you have the API key required by the role's provider. If the key is missing, it lists your available providers and offers one-step adaptation. Pass --yes to auto-adapt non-interactively.

See CLI Reference: Configure Options for the full flag reference.

Dashboard Provider Setup

You can configure API keys directly from the web dashboard, no terminal required. The provider setup form is available in three places:

  • Launchpad zero-state (shown on fresh installs before any agents are created)
  • Agent creation page (prompted when no provider is configured)
  • System page (full provider management panel with status indicators)

The dashboard supports all standard providers (OpenAI, Anthropic, Google, Groq, Mistral, Cohere, Bedrock, xAI) plus OpenRouter. Key validation is available for OpenAI and Anthropic, where the dashboard checks the key before saving.

For programmatic use, two API endpoints are available:

  • GET /api/providers/status returns configuration state for all providers
  • POST /api/providers/save-key saves an API key for a provider (accepts optional base_url for custom endpoints)

Model Selection

PROVIDER_MODELS in templates.py maintains curated model lists for each provider. The conversational builder (initrunner new) and setup wizard (initrunner setup) present these as a numbered menu. The --model flag on new and setup bypasses the interactive prompt. Custom model names are always accepted; the curated list is a convenience, not a restriction.

ProviderModelDescription
openaigpt-5.4Latest frontier model (default)
openaigpt-5-miniFast, affordable
openaigpt-5-nanoSmallest, ultra-fast
openaigpt-4.1GPT-4.1
openaio4-miniFast reasoning
openaio3Reasoning model
anthropicclaude-sonnet-4-6Balanced, fast (default)
anthropicclaude-opus-4-6Most capable
anthropicclaude-haiku-4-5-20251001Compact, very fast
googlegemini-2.5-flashFast multimodal (default)
googlegemini-2.5-proMost capable
googlegemini-2.5-flash-liteLightweight
groqllama-4-scout-17b-16eLlama 4 Scout (default)
groqllama-3.3-70b-versatileFast Llama 70B
groqllama-3.1-8b-instantUltra-fast 8B
mistralmistral-large-latestMost capable (default)
mistralmistral-small-latestFast, efficient
mistralcodestral-latestCode-optimized
mistraldevstral-small-2505Agentic coding
coherecommand-aMost capable, 256K context (default)
coherecommand-r-plusAdvanced RAG
coherecommand-rBalanced
bedrockus.anthropic.claude-sonnet-4-6-v1:0Claude Sonnet 4.6 via Bedrock (default)
bedrockus.anthropic.claude-haiku-4-5-v1:0Claude Haiku 4.5 via Bedrock
bedrockus.meta.llama4-scout-17b-instruct-v1:0Llama 4 Scout via Bedrock
xaigrok-4Most capable Grok (default)
xaigrok-4-fastFast, 2M context
xaigrok-3-mini-betaLightweight
ollamallama3.2Llama 3.2 (default)
ollamallama3.1Llama 3.1
ollamamistralMistral 7B
ollamacodellamaCode Llama
ollamaqwen2.5Multilingual

For Ollama, the wizard also queries the local Ollama server for installed models and shows those if available.

Ollama (Local Models)

Set provider: ollama. No API key is needed, and the runner defaults to http://localhost:11434/v1:

spec:
  model:
    provider: ollama
    name: llama3.2

Override the URL if Ollama is on a different host or port:

spec:
  model:
    provider: ollama
    name: llama3.2
    base_url: http://192.168.1.50:11434/v1

Docker note: If the runner is inside Docker and Ollama is on the host, use http://host.docker.internal:11434/v1 as the base_url.

See Ollama for a full Ollama setup guide.

OpenRouter / Custom Endpoints

Any OpenAI-compatible API works. Set provider: openai, point base_url at the endpoint, and tell the runner which env var holds the API key:

spec:
  model:
    provider: openai
    name: anthropic/claude-sonnet-4
    base_url: https://openrouter.ai/api/v1
    api_key_env: OPENROUTER_API_KEY

This also works for vLLM, LiteLLM, Azure OpenAI, or any other service that exposes the OpenAI chat completions format.

Embedding endpoints: api_key_env works for all embedding providers (standard and custom) via ingest.embeddings.api_key_env or memory.embeddings.api_key_env. When set, InitRunner validates the key at startup and fails fast with an actionable error if it's missing. See Ingestion: Embedding Options for details.

Fallback Chain

Since v2026.4.17, spec.model.fallback accepts a list of provider:model strings (or aliases). When set, the runner wraps the primary and fallbacks in PydanticAI's FallbackModel so runs survive single-provider outages (5xx, 429, auth failures, connection resets) without any call-site changes.

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
    fallback:
      - openai:gpt-5-mini
      - groq:llama-4-scout-17b-16e

Model aliases defined in ~/.initrunner/models.yaml are accepted in the fallback list too:

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
    fallback:
      - smart
      - fast

Validation happens at load time. Every entry is resolved and the matching provider SDK import is probed; a missing extra fails fast, not at first failover. When a run exhausts the chain, the error string lists every provider's failure in order.

Restrictions. Ollama and custom-base_url providers are rejected in the fallback list because aliases can't carry a base_url. If you need a local fallback, promote the Ollama model to the primary and put cloud providers in the fallback chain, or use an explicit OpenAI-compatible endpoint alias.

Choosing which errors trigger failover. Since v2026.6.4, spec.model.fallback_on narrows or widens which exceptions move the run to the next candidate. By default, failover triggers on ModelAPIError (the base for any provider API or HTTP failure). Set fallback_on to a list of PydanticAI exception names to change that:

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
    fallback:
      - openai:gpt-5-mini
    fallback_on:
      - ModelHTTPError
      - ContentFilterError

Valid names are ModelAPIError (the default), ModelHTTPError (HTTP status errors only), UnexpectedModelBehavior, and ContentFilterError. fallback_on requires a non-empty fallback list; it is rejected at load time on its own.

Model Aliases & Runtime Override

You can define semantic aliases (fast, smart, local) in ~/.initrunner/models.yaml and override the model at runtime with --model or INITRUNNER_MODEL. See Model Aliases for full details.

# Override model at runtime
initrunner run role.yaml -p "hello" --model fast

# Use alias in role YAML (provider auto-resolved)
spec:
  model:
    name: fast

Advanced Model Settings

Since v2026.6.4, spec.model accepts a set of passthrough settings that go straight to PydanticAI's ModelSettings. Leave any of them unset to use the provider default.

spec:
  model:
    provider: openai
    name: gpt-5-mini
    top_p: 0.9
    top_k: 40
    seed: 42
    stop_sequences: ["\n\nUser:"]
    parallel_tool_calls: true
    presence_penalty: 0.5
    frequency_penalty: 0.2
    logit_bias: { "1734": -100 }
    extra_headers: { "X-Title": "my-agent" }
    extra_body: { "provider": { "order": ["anthropic"] } }
FieldTypeDescription
top_pfloat (0.0-1.0)Nucleus sampling threshold
top_kintTop-k sampling cutoff
seedintBest-effort deterministic sampling on providers that support it
stop_sequenceslist of stringsSequences that end generation when produced
parallel_tool_callsboolWhether the model may request multiple tool calls in one turn
presence_penaltyfloat (-2.0-2.0)Penalize tokens already present
frequency_penaltyfloat (-2.0-2.0)Penalize tokens by frequency
logit_biasmap of string to intPer-token likelihood adjustments
extra_headersmap of string to stringExtra HTTP headers sent with every model request
extra_bodymappingExtra JSON merged into the provider request body (provider-specific routing flags, etc.)

The sampling knobs (top_p, top_k, presence_penalty, frequency_penalty, logit_bias) are dropped on OpenAI reasoning models, the same way temperature already is. extra_headers and extra_body are always passed through.

Static Tool Choice

spec.model.tool_choice sets a static tool policy. Only two values are accepted:

  • auto (the provider default): the model decides whether to call a tool.
  • none: tools are disabled and the model produces text only.
spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
    tool_choice: none

required and tool-name lists are rejected at load time. A static value there would force a tool call on every step and prevent the model from producing a final response; per-step forcing needs a dynamic capability instead.

Prompt Caching

Since v2026.6.4, spec.model.prompt_cache caches the static prefix of a request (system instructions plus tool definitions) so repeated runs of a role reuse it instead of re-billing those input tokens. This is worthwhile for daemons, triggers, and REPLs whose static prompt dwarfs the per-turn user input.

Enable it with the shorthand, or pass a mapping to tune it:

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
    prompt_cache: true        # caches instructions + tool definitions, 5m TTL
spec:
  model:
    prompt_cache:
      instructions: true       # cache the system prompt (default true)
      tools: true              # cache the tool definitions block (default true)
      ttl: 1h                  # "5m" (default) or "1h"

Caching is available on Anthropic and Bedrock only (it maps to their *_cache_instructions / *_cache_tool_definitions settings) and is rejected at load time on any other provider.

Model Call Retries

Changed in v2026.6.4: model-call retries now live in the httpx transport (PydanticAI's AsyncTenacityTransport). Each request is retried on 429, 500, 502, 503, and 504 with exponential backoff and Retry-After support, uniformly across one-shot, REPL, streaming, and daemon runs. Permanent errors (401, 403, 404, 422) surface immediately. The transport covers OpenAI, Anthropic, Google, Groq, Mistral, Cohere, and custom OpenAI-compatible endpoints; Bedrock and xAI keep their SDK-native retries.

Tune the policy under spec.execution:

spec:
  execution:
    http_retries: 3          # total attempts per request (1-10, default 3)
    http_retry_max_wait: 60  # cap in seconds for one backoff/Retry-After wait (default 60)

Because retries live in the httpx transport (below the agent loop), they apply uniformly across all run modes without restarting the whole agent turn.

Model Request Concurrency

Since v2026.6.4, spec.model.concurrency caps how many model requests are in flight at once, optionally sharing one budget across several agents in the same process (compose services, team personas, flow nodes). This is the lever for staying under a provider rate limit when many agents share an API key.

spec:
  model:
    concurrency:
      max_running: 4          # max concurrent in-flight requests
      max_queued: 50          # optional: reject once this many are waiting
      share: openai-pool      # optional: agents with the same name share one budget

Without share, the cap is per-agent. With a share name, every agent in the same process whose model config uses that name coordinates against a single budget, so a pool of personas hitting one key can be held to a combined limit. This maps to PydanticAI's ConcurrencyLimitedModel. It is distinct from execution.max_concurrency, which bounds an agent's parallel tool execution; concurrency bounds model requests.

Model Config Reference

FieldTypeDefaultDescription
providerstring(empty)Provider name. Required unless name contains a colon or resolves via alias. Values: openai, anthropic, google, groq, mistral, cohere, bedrock, xai, ollama
namestring(required)Model identifier, alias name, or provider:model string
base_urlstringnullCustom endpoint URL (triggers OpenAI-compatible mode)
api_key_envstringnullEnvironment variable containing the API key
temperaturefloat0.1Sampling temperature (0.0-2.0)
max_tokensint4096Maximum tokens per response (1-128000)
fallbacklist of strings[]Ordered provider:model strings (or aliases) for automatic failover. Standard providers only.
fallback_onlist of strings[]Since v2026.6.4. Exception types that trigger failover (ModelAPIError, ModelHTTPError, UnexpectedModelBehavior, ContentFilterError). Empty uses the ModelAPIError default. Requires fallback.
concurrencymappingnullSince v2026.6.4. Cap concurrent model requests (max_running, max_queued, share).
top_pfloatnullSince v2026.6.4. Nucleus sampling threshold (0.0-1.0). Dropped on OpenAI reasoning models.
top_kintnullSince v2026.6.4. Top-k sampling cutoff. Dropped on OpenAI reasoning models.
seedintnullSince v2026.6.4. Best-effort deterministic sampling.
stop_sequenceslist of stringsnullSince v2026.6.4. Sequences that end generation.
parallel_tool_callsboolnullSince v2026.6.4. Allow multiple tool calls per turn.
presence_penaltyfloatnullSince v2026.6.4. Penalize present tokens (-2.0-2.0). Dropped on OpenAI reasoning models.
frequency_penaltyfloatnullSince v2026.6.4. Penalize tokens by frequency (-2.0-2.0). Dropped on OpenAI reasoning models.
logit_biasmap of string to intnullSince v2026.6.4. Per-token likelihood adjustments. Dropped on OpenAI reasoning models.
extra_headersmap of string to stringnullSince v2026.6.4. Extra HTTP headers per model request.
extra_bodymappingnullSince v2026.6.4. Extra JSON merged into the provider request body.
tool_choiceauto or nonenullSince v2026.6.4. Static tool policy. none disables tools (text-only). required and tool-name lists are rejected.
prompt_cachebool or mappingnullSince v2026.6.4. Provider-native prompt caching (Anthropic, Bedrock only).

Embedding Configuration

When using RAG (spec.ingest) or memory (spec.memory), InitRunner needs an embedding model to generate vectors. The embedding provider is resolved separately from the agent's LLM provider.

Default Resolution

The embedding model is determined by the agent's spec.model.provider unless overridden:

Agent ProviderDefault Embedding ModelRequires
openaiopenai:text-embedding-3-smallOPENAI_API_KEY
anthropicopenai:text-embedding-3-smallOPENAI_API_KEY
googlegoogle:text-embedding-004GOOGLE_API_KEY
ollamaollama:nomic-embed-textOllama running locally
locallocal:BAAI/bge-small-en-v1.5initrunner[local-embeddings]
All othersopenai:text-embedding-3-smallOPENAI_API_KEY

local is not ollama. The local provider runs the embedding model in-process via fastembed with no HTTP hop, no API key, and no separate server. No document text leaves the process. The ollama provider routes through an OpenAI-compatible HTTP client and needs a running Ollama endpoint. Pick local when you want zero external dependencies; pick ollama when you already run Ollama and want to share its model cache.

Important: Anthropic does not offer an embeddings API. If your agent uses provider: anthropic, you still need OPENAI_API_KEY set for embeddings. This only applies when using RAG or memory. Pure chat agents don't need it.

Overriding the Embedding Model

Set embeddings.provider and embeddings.model in your ingest or memory config:

spec:
  model:
    provider: anthropic
    name: claude-sonnet-4-6
  ingest:
    sources: ["./docs/**/*.md"]
    embeddings:
      provider: openai
      model: text-embedding-3-large

Local in-process embeddings (fastembed)

The local provider embeds text on the same machine that runs the agent, with no HTTP request and no API key. It uses fastembed, which ships quantized ONNX models and does not pull in PyTorch. Install the extra:

uv pip install "initrunner[local-embeddings]"

Then set provider: local in your ingest or memory embeddings config:

spec:
  ingest:
    sources: ["./docs/**/*.md"]
    embeddings:
      provider: local
      model: BAAI/bge-small-en-v1.5   # 384 dimensions, default; omit to use it

This works the same way under spec.memory.embeddings. The local provider takes no base_url and no api_key_env; those fields are ignored for it.

The model is downloaded from Hugging Face on first use (a few hundred MB) and cached on disk; later runs load it from the cache. The first embedding call after process start pays a one-time load cost. Choose a larger model for higher retrieval quality at the cost of speed and a different vector dimension:

ModelDimensionsNotes
BAAI/bge-small-en-v1.5384Default. Fast on CPU, good quality.
BAAI/bge-base-en-v1.5768Larger, slower, higher quality.
BAAI/bge-large-en-v1.51024Largest of the family.

Run python -c "from fastembed import TextEmbedding; print([m['model'] for m in TextEmbedding.list_supported_models()])" to list every model fastembed supports.

Dimension consistency. A store (RAG index or memory store) is locked to the embedding dimension of the model that first wrote to it. You cannot query or extend that store with a model of a different dimension: switching from BAAI/bge-small-en-v1.5 (384) to BAAI/bge-base-en-v1.5 (768), or between local and any HTTP provider whose vectors differ in size, raises a DimensionMismatchError on reopen. To change the embedding model, point the agent at a fresh store_path and re-ingest.

CPU performance. fastembed runs on CPU by default. It is fast for typical document sets, but for very large batches expect throughput to be lower than a hosted GPU endpoint. Ingestion batches embeddings, so this is rarely a problem for one-time indexing.

Custom Embedding Endpoints

For self-hosted or third-party embedding services, use base_url and api_key_env:

spec:
  ingest:
    embeddings:
      provider: openai
      model: my-embedding-model
      base_url: https://my-embedding-service.example.com/v1
      api_key_env: MY_EMBEDDING_API_KEY

Embedding Config Reference

FieldTypeDefaultDescription
providerstr""Embedding provider. Empty string derives from spec.model.provider. Use local for in-process fastembed (no HTTP, no key).
modelstr""Embedding model name. Empty string uses the provider default.
base_urlstr""Custom endpoint URL. Triggers OpenAI-compatible mode.
api_key_envstr""Env var holding the embedding API key. Works for all providers (not just custom endpoints). When empty, the default key for the resolved provider is used automatically.

See Ingestion: Embedding Models for the full embedding model reference and RAG Guide: Embedding Model Options for a comparison table.

Full Role Example

A complete role definition showing model, tools, ingestion, triggers, and guardrails:

apiVersion: initrunner/v1
kind: Agent
metadata:
  name: support-agent
  description: Answers questions from the support knowledge base
  tags:
    - support
    - rag
spec:
  role: |
    You are a support agent. Use search_documents to find relevant
    articles before answering. Always cite your sources.
  model:
    provider: openai
    name: gpt-5-mini
    temperature: 0.1
    max_tokens: 4096
  ingest:
    sources:
      - "./knowledge-base/**/*.md"
      - "./docs/**/*.pdf"
    chunking:
      strategy: fixed
      chunk_size: 512
      chunk_overlap: 50
  tools:
    - type: filesystem
      root_path: ./src
      read_only: true
    - type: mcp
      transport: stdio
      command: npx
      args: ["-y", "@anthropic/mcp-server-filesystem"]
  triggers:
    - type: file_watch
      paths: ["./knowledge-base"]
      extensions: [".html", ".md"]
      prompt_template: "Knowledge base updated: {path}. Re-index."
    - type: cron
      schedule: "0 9 * * 1"
      prompt: "Generate weekly support coverage report."
  guardrails:
    max_tokens_per_run: 50000
    max_tool_calls: 20
    timeout_seconds: 300
    max_request_limit: 50

Architecture

YAML role files define the agent. The loader parses and validates them, then constructs a PydanticAI agent wired with the configured tools, stores, and audit logger. The runner executes the agent in one of three modes: single-shot, interactive REPL, or trigger-driven daemon. For multi-agent workflows, a flow definition orchestrates multiple agents with inter-agent delegation and health monitoring.

On this page