InitRunner

RAG Patterns & Guide

This guide covers practical patterns for using InitRunner's retrieval-augmented generation (RAG) capabilities. For full configuration reference, see Ingestion and Memory.

RAG vs Memory: When to Use Which

InitRunner has two systems for giving agents access to information beyond their training data:

AspectIngestion (RAG)Memory
PurposeSearch external documentsRemember learned information
Data sourceFiles on disk, URLsAgent's own observations
Who writesYou (via initrunner ingest)Agent (via remember() tool)
Who readsAgent (via search_documents())Agent (via recall())
Best forKnowledge base Q&A, doc searchPersonalization, context carry-over
PersistenceRebuilt on each ingest runAccumulates across sessions

You can use both together: ingestion for your docs, memory for user preferences.

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
  memory:
    semantic:
      max_memories: 500

End-to-End Walkthrough

1. Create a role with ingestion

Create role.yaml:

apiVersion: initrunner/v1
kind: Agent
metadata:
  name: docs-agent
  description: Documentation Q&A agent
spec:
  role: |
    You are a documentation assistant. ALWAYS call search_documents
    before answering questions. Cite your sources.
  model:
    provider: openai
    name: gpt-4o-mini
  ingest:
    sources:
      - "./docs/**/*.md"
    chunking:
      strategy: paragraph
      chunk_size: 512
      chunk_overlap: 50

2. Add some documents

Create a docs/ directory with markdown files:

docs/
├── getting-started.md
├── api-reference.md
└── faq.md

3. Ingest documents

$ initrunner ingest role.yaml
Ingesting documents for docs-agent...
 Stored 47 chunks from 3 files

4. Run the agent

$ initrunner run role.yaml -p "How do I authenticate?"

The agent calls search_documents("authenticate") behind the scenes, retrieves matching chunks from your docs, and uses them to answer.

5. Interactive session

$ initrunner run role.yaml -i
docs-agent> How do I get an API key?

I found the answer in your documentation. Per the Getting Started guide
(./docs/getting-started.md), you can generate an API key by navigating to
Settings > API Keys in your dashboard...

docs-agent> What rate limits apply?

According to the API Reference (./docs/api-reference.md), the default rate
limit is 100 requests per minute per API key...

Choosing an Embedding Model

The embedding model determines how well semantic search performs. Different models trade off between dimension size, cost, speed, and quality.

ModelProviderDimensionsNotes
text-embedding-3-smallOpenAI1536Fast and cheap, a good default for most use cases
text-embedding-3-largeOpenAI3072Higher quality at higher cost
text-embedding-004Google768Cost-effective; strong multilingual support
nomic-embed-textOllama768Fully local, no API key or network needed
BAAI/bge-small-en-v1.5local (fastembed)384Runs in-process, no HTTP hop, no API key; needs the local-embeddings extra

Which model should I use?

  • Cost-sensitive: Google text-embedding-004 or Ollama nomic-embed-text
  • Precision-critical: OpenAI text-embedding-3-large
  • Fully local / no API keys: Ollama nomic-embed-text
  • Truly offline / no external API: the local provider (fastembed) runs the model in-process. The default BAAI/bge-small-en-v1.5 (384 dims) is a good start; BAAI/bge-base-en-v1.5 (768 dims) is higher quality but needs a fresh store_path, since changing the embedding dimension is not backward compatible.
  • Google ecosystem: Google text-embedding-004

The default (openai:text-embedding-3-small) is a sensible starting point for most projects. See Providers for the full embedding configuration reference and how to override the default.

Common Patterns

Basic knowledge base

Single format, paragraph chunking for natural document boundaries:

ingest:
  sources:
    - "./knowledge-base/**/*.md"
  chunking:
    strategy: paragraph
    chunk_size: 512
    chunk_overlap: 50

Multi-format knowledge base

Mix HTML, Markdown, and PDF sources. Install initrunner[ingest] for PDF support:

ingest:
  sources:
    - "./docs/**/*.md"
    - "./docs/**/*.html"
    - "./docs/**/*.pdf"
  chunking:
    strategy: fixed
    chunk_size: 1024
    chunk_overlap: 100

URL-based ingestion

Ingest content from remote URLs alongside local files:

ingest:
  sources:
    - "./local-docs/**/*.md"
    - "https://docs.example.com/api/reference"
    - "https://docs.example.com/changelog"

URL content is hashed, so re-running ingest skips unchanged pages.

Running on source changes with a file watch trigger

Since v2026.4.10, source changes are detected on every initrunner run automatically, so you don't need a trigger just to keep the index fresh. Reach for a file_watch trigger when you want the agent to actually run on change (for example, to summarize the edit or notify a channel), not just re-ingest:

spec:
  ingest:
    sources:
      - "./knowledge-base/**/*.md"
  triggers:
    - type: file_watch
      paths:
        - ./knowledge-base
      extensions:
        - .md
      prompt_template: "Knowledge base updated: {path}. Re-index."
      debounce_seconds: 1.0

Using source filter to scope searches

When your knowledge base spans multiple topics, use the source parameter to narrow results:

spec:
  role: |
    You are a support agent. When the user asks about billing, search
    only billing docs: search_documents(query, source="*billing*").
    For technical issues, search: search_documents(query, source="*troubleshooting*").
  ingest:
    sources:
      - "./kb/billing/**/*.md"
      - "./kb/troubleshooting/**/*.md"
      - "./kb/general/**/*.md"

Hybrid retrieval (vector + keyword)

Dense vector search matches on meaning, so it can miss exact tokens like identifiers, error codes, version strings, and acronyms that do not have a strong semantic signal. Hybrid retrieval runs both a dense vector search and a BM25 full-text search, then fuses the two result lists with reciprocal rank fusion (RRF).

Set the strategy on spec.ingest.retriever:

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
    retriever:
      strategy: hybrid

The three strategies:

StrategyWhat it doesExtra dependency
vectorDense cosine search only. The default, unchanged behaviour.None
hybridRRF fusion of dense vector and BM25 full-text results.None (RRF ships with LanceDB)
hybrid_rerankHybrid fusion, then a cross-encoder reranks the fused results.Optional (see below)

The BM25 full-text index is built automatically on the next initrunner ingest, so an existing store picks up hybrid search after a re-ingest. No new required dependency is added for vector or hybrid.

hybrid_rerank adds a cross-encoder pass on top of hybrid for higher precision. The cross-encoder backend (sentence-transformers) is optional. When it is not installed, hybrid_rerank falls back to plain hybrid (RRF) instead of failing. Install it with:

$ uv pip install sentence-transformers

Tunable retriever config with verified defaults:

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
    retriever:
      strategy: hybrid_rerank
      reranker_model: cross-encoder/ms-marco-MiniLM-L-6-v2
      rrf_k: 60

An agent can override the configured mode for a single call with the strategy parameter on the search tool: search_documents(query, strategy="hybrid"). The accepted values are the same three: vector, hybrid, and hybrid_rerank.

Fully local RAG with Ollama

No external API keys needed. Use Ollama for both the LLM and embeddings:

spec:
  model:
    provider: ollama
    name: llama3.2
  ingest:
    sources:
      - "./docs/**/*.md"
    embeddings:
      provider: ollama
      model: nomic-embed-text

See the Providers page for Ollama setup instructions.

Next Steps

On this page