Multimodal Input

InitRunner supports sending images, audio, video, and documents alongside text prompts. Multimodal input works across the CLI, interactive REPL, OpenAI-compatible API server, and web dashboard.

Supported File Types

Category	Extensions	Notes
Image	`.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`	Most models support these natively
Audio	`.mp3`, `.wav`, `.ogg`, `.flac`, `.aac`	Requires model support (e.g. `gpt-4o-audio-preview`)
Video	`.mp4`, `.webm`, `.mov`, `.mkv`	Limited model support
Document	`.pdf`, `.docx`, `.xlsx`	Sent as binary content
Text	`.txt`, `.md`, `.csv`, `.html`	Inlined as text in the prompt

Size limit: 20 MB per file.

CLI Usage

Use --attach (or -A) to attach files or URLs to a prompt. The flag is repeatable.

# Single file
initrunner run role.yaml -p "Describe this image" -A photo.png

# Multiple files
initrunner run role.yaml -p "Compare these" -A before.png -A after.png

# URL attachment
initrunner run role.yaml -p "What's in this image?" -A https://example.com/photo.jpg

# Mixed files and URLs
initrunner run role.yaml -p "Summarize" -A report.pdf -A https://example.com/chart.png

--attach requires -p (or piped stdin). Without a prompt, the command exits with an error.

Interactive REPL

In interactive mode (-i), three commands manage attachments:

Command	Description
`/attach <path_or_url>`	Queue a file or URL for the next prompt
`/attachments`	List queued attachments
`/clear-attachments`	Clear all queued attachments

Queued attachments are sent with your next message and then cleared automatically.

> /attach diagram.png
Queued attachment: diagram.png
> /attach notes.pdf
Queued attachment: notes.pdf
> /attachments
  1. diagram.png
  2. notes.pdf
> What do these show?
[assistant response with both attachments]
> /attachments
No attachments queued.

Server API (OpenAI Format)

The initrunner run --serve endpoint accepts multimodal content in the standard OpenAI format. The content field of a ChatMessage can be a string or a list of content parts.

Content Part Types

Type	Field	Description
`text`	`text`	Plain text content
`image_url`	`image_url`	Image via HTTP URL or base64 `data:` URI
`input_audio`	`input_audio`	Audio as base64 with format specifier

Image via URL

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }]
  }'

Image via Base64

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
      ]
    }]
  }'

Audio Input

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Transcribe this audio."},
        {"type": "input_audio", "input_audio": {"data": "<base64>", "format": "mp3"}}
      ]
    }]
  }'

The format field defaults to "mp3" if omitted.

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")

response = client.chat.completions.create(
    model="my-agent",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
        ],
    }],
)
print(response.choices[0].message.content)

Web Dashboard

The chat interface supports file uploads via a button or drag-and-drop.

Upload flow:

Files are uploaded to POST /roles/{role_id}/chat/upload and staged in memory
The server returns a list of attachment IDs
Attachment IDs are passed to the SSE stream endpoint with the next prompt
Staged files expire after 5 minutes if unused

Limits: 20 MB per file, same supported file types as the CLI.

Dashboard

In the dashboard chat, use the upload button or drag-and-drop to attach files. The same file type restrictions and 20 MB size limit apply.

Model Support

Not all models support all modalities. If a model doesn't support a given content type, the provider API will return an error.

Modality	Example models
Images	`gpt-5.4`, `gpt-5-mini`, `claude-sonnet-4-6`, `gemini-2.5-flash`
Audio	`gpt-4o-audio-preview`
Video	`gemini-2.5-flash`
Documents (PDF)	`gpt-5.4`, `claude-sonnet-4-6`, `gemini-2.5-flash`

When in doubt, use gpt-5.4 or a Claude model for broad multimodal support.

Error Handling

Condition	Error
File not found	`Attachment file not found: <path>`
No file extension	`Cannot determine file type — file has no extension: <path>`
Unsupported extension	`Unsupported file type '<ext>' for: <path>. Supported: ...`
File exceeds 20 MB	`File too large (<size> MB): <path>. Maximum: 20 MB`
Dashboard upload too large	`File too large: <filename> (max 20 MB)` (HTTP 400)

In the interactive REPL, attachment errors are printed and the prompt is not sent. In the CLI, the command exits with a non-zero status.

On this page