Testing

InitRunner includes built-in tools for testing agents before deploying them — schema validation, dry-run mode (no API calls), and an eval-style test suite runner.

Validation

Validate a role YAML against the schema without running the agent:

initrunner validate role.yaml

This checks:

YAML syntax and structure
Required fields (apiVersion, kind, metadata.name, spec.role)
Field types and value ranges (e.g. temperature between 0.0 and 2.0)
Tool configurations (valid types, required fields per type)
Skill references (file exists, frontmatter is valid)
Trigger configurations (valid cron expressions, valid paths)
Security policy structure

Validation exits with code 0 on success and non-zero on failure, making it suitable for CI pipelines.

Dry-Run Mode

Run an agent without making any LLM API calls:

initrunner run role.yaml --dry-run -p "Test prompt"

Dry-run mode replaces the configured model with a TestModel that returns deterministic placeholder responses. This lets you verify:

Tool registration and discovery
Trigger configuration and startup
Memory system initialization
Skill loading and merging
Guardrail enforcement logic
Sink configuration

No API keys are required and no tokens are consumed. Use dry-run mode during development to catch configuration errors before spending on API calls.

Test Suites

The initrunner test command runs structured test suites against an agent using an eval framework.

initrunner test role.yaml -s test_suite.yaml

Test Suite Format

A test suite is a YAML file defining test cases with inputs and expected outcomes:

name: support-agent-tests
description: Regression tests for the support agent

tests:
  - name: answers_product_question
    prompt: "What is the return policy?"
    assertions:
      - type: contains
        value: "30 days"
      - type: contains
        value: "refund"

  - name: rejects_off_topic
    prompt: "What's the weather like?"
    assertions:
      - type: not_contains
        value: "forecast"
      - type: max_tokens
        value: 200

  - name: uses_search_tool
    prompt: "Find articles about shipping delays"
    assertions:
      - type: tool_called
        value: search_documents
      - type: contains
        value: "shipping"

  - name: stays_within_budget
    prompt: "Write a comprehensive guide to our product line"
    assertions:
      - type: max_tokens
        value: 4096
      - type: max_tool_calls
        value: 10

Assertion Types

Type	Description
`contains`	Output contains the specified string (case-insensitive)
`not_contains`	Output does not contain the specified string
`regex`	Output matches the regex pattern
`max_tokens`	Output token count is within the limit
`max_tool_calls`	Number of tool calls is within the limit
`tool_called`	The specified tool was invoked during the run
`tool_not_called`	The specified tool was not invoked
`exit_status`	Run completed with the expected status (`success` or `error`)

Running Tests

# Run a test suite
initrunner test role.yaml -s test_suite.yaml

# Dry-run tests (no API calls, uses TestModel)
initrunner test role.yaml -s test_suite.yaml --dry-run

# Verbose output
initrunner test role.yaml -s test_suite.yaml -v

Flag	Type	Default	Description
`-s, --suite`	`str`	(required)	Path to the test suite YAML
`--dry-run`	`bool`	`false`	Use TestModel instead of real API calls
`-v, --verbose`	`bool`	`false`	Show full output for each test case

Test Output

Running suite: support-agent-tests (4 tests)

  ✓ answers_product_question (1.2s, 340 tokens)
  ✓ rejects_off_topic (0.8s, 95 tokens)
  ✓ uses_search_tool (2.1s, 520 tokens)
  ✗ stays_within_budget
      FAIL: max_tokens — expected ≤4096, got 4301

Results: 3 passed, 1 failed (4.1s total)

Testing Workflow

A practical workflow for developing and testing agents:

Validate — catch schema errors early:
```
initrunner validate role.yaml
```
Dry-run — verify tool registration and config without API calls:
```
initrunner run role.yaml --dry-run -p "Test prompt"
```
Interactive test — manual testing in REPL mode:
```
initrunner run role.yaml -i
```
Suite test — run automated assertions against real model output:
```
initrunner test role.yaml -s tests/regression.yaml
```

CI integration — validate and dry-run in CI, suite tests on schedule:

# In CI pipeline
initrunner validate role.yaml
initrunner test role.yaml -s tests/smoke.yaml --dry-run

On this page