Runner
The Runner is the execution engine that powers agent runs in herdctl. It integrates with the Claude Agent SDK to execute agents, stream output in real-time, and manage the full job lifecycle.
Architecture Overview
Section titled “Architecture Overview”┌─────────────────────────────────────────────────────────────────────┐│ JobExecutor │├─────────────────────────────────────────────────────────────────────┤│ 1. Create job record ││ 2. Transform config → SDK options (sdk-adapter) ││ 3. Execute SDK query (async iterator) ││ 4. Process messages (message-processor) ││ 5. Stream output to JSONL ││ 6. Update job status and session info │└─────────────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ SDK │ │ State │ │ Error │ │ Adapter │ │ Management │ │ Handler │ └──────────┘ └──────────────┘ └────────────┘The runner module consists of four main components:
| Component | File | Purpose |
|---|---|---|
| JobExecutor | job-executor.ts | Main execution engine and lifecycle manager |
| SDK Adapter | sdk-adapter.ts | Transforms agent config to SDK format |
| Message Processor | message-processor.ts | Validates and transforms SDK messages |
| Error Handler | errors.ts | Classifies errors and provides diagnostics |
SDK Integration
Section titled “SDK Integration”The runner integrates with the Claude Agent SDK using an async iterator pattern. This enables real-time streaming of agent output without buffering.
Async Iterator Pattern
Section titled “Async Iterator Pattern”The SDK’s query() function returns an AsyncIterable<SDKMessage>, which the runner consumes:
// SDK query function signaturetype SDKQueryFunction = (params: { prompt: string; options?: Record<string, unknown>; abortController?: AbortController;}) => AsyncIterable<SDKMessage>;
// Execution loopconst messages = sdkQuery({ prompt, options: sdkOptions });
for await (const message of messages) { // Process each message as it arrives const processed = processSDKMessage(message);
// Write immediately to JSONL (no buffering) await appendJobOutput(jobsDir, jobId, processed.output);
// Check for terminal message if (processed.isFinal) { break; }}Key Benefits
Section titled “Key Benefits”- Real-time streaming: Messages appear immediately in job output
- Memory efficiency: No buffering of large outputs
- Concurrent readers: Other processes can tail the JSONL file
- Graceful shutdown: Can stop mid-execution via AbortController
Permission Modes
Section titled “Permission Modes”The runner supports four permission modes that control how tool calls are approved:
Mode Comparison
Section titled “Mode Comparison”| Mode | Description | Auto-Approved Tools |
|---|---|---|
default | Requires approval for everything | None |
acceptEdits | Default - Auto-approves file operations | Read, Write, Edit, mkdir, rm, mv, cp |
bypassPermissions | Auto-approves all tools | All tools |
plan | Planning only, no execution | None |
Configuration
Section titled “Configuration”Set the permission mode in your agent configuration:
name: my-agent
permissions: mode: acceptEdits # default, acceptEdits, bypassPermissions, plan
# Optional: explicitly allow specific tools allowed_tools: - Bash - Read - Write
# Optional: deny specific tools denied_tools: - mcp__github__create_issuePermission Examples
Section titled “Permission Examples”Default Mode (Safest)
Section titled “Default Mode (Safest)”Every tool call requires human approval:
permissions: mode: defaultUse for: High-stakes operations, new agents, untested workflows.
Accept Edits Mode (Recommended)
Section titled “Accept Edits Mode (Recommended)”File operations auto-approve, other tools require approval:
permissions: mode: acceptEditsUse for: Most development workflows where file edits are the primary action.
Bypass Permissions Mode (Autonomous)
Section titled “Bypass Permissions Mode (Autonomous)”All tools auto-approve—the agent runs fully autonomously:
permissions: mode: bypassPermissionsUse for: Trusted agents in controlled environments, scheduled jobs, CI/CD.
Plan Mode (Research Only)
Section titled “Plan Mode (Research Only)”Agent can plan but not execute tools:
permissions: mode: planUse for: Exploring solutions without making changes, generating plans for review.
Tool Permissions
Section titled “Tool Permissions”Fine-grained control over specific tools:
permissions: mode: acceptEdits
# Whitelist specific tools allowed_tools: - Bash - Read - Write - Edit - mcp__github__* # Wildcard for all GitHub MCP tools
# Blacklist dangerous tools denied_tools: - mcp__postgres__execute_query # Prevent database writesMCP Server Configuration
Section titled “MCP Server Configuration”MCP (Model Context Protocol) servers extend agent capabilities with external tools.
Server Types
Section titled “Server Types”Process-Based Servers
Section titled “Process-Based Servers”Spawn a local process that communicates via stdio:
mcp_servers: github: command: npx args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_TOKEN: ${GITHUB_TOKEN} # Environment variable interpolationHTTP-Based Servers
Section titled “HTTP-Based Servers”Connect to a remote MCP endpoint:
mcp_servers: custom-api: url: http://localhost:8080/mcpTool Naming Convention
Section titled “Tool Naming Convention”MCP tools are namespaced as mcp__<server>__<tool>:
mcp__github__create_issuemcp__github__list_pull_requestsmcp__postgres__querymcp__filesystem__read_fileCommon MCP Servers
Section titled “Common MCP Servers”| Server | Package | Purpose |
|---|---|---|
| GitHub | @modelcontextprotocol/server-github | Issues, PRs, repos |
| Filesystem | @modelcontextprotocol/server-filesystem | File operations |
| PostgreSQL | @modelcontextprotocol/server-postgres | Database access |
| Memory | @modelcontextprotocol/server-memory | Persistent key-value store |
Full Configuration Example
Section titled “Full Configuration Example”name: full-stack-agent
mcp_servers: # GitHub for issue management github: command: npx args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_TOKEN: ${GITHUB_TOKEN}
# Database for analytics postgres: command: npx args: ["-y", "@modelcontextprotocol/server-postgres"] env: DATABASE_URL: ${DATABASE_URL}
# Custom internal API internal-api: url: ${INTERNAL_API_URL}
permissions: mode: acceptEdits allowed_tools: - mcp__github__* - mcp__postgres__query # Read-only denied_tools: - mcp__postgres__execute # No writesSession Management
Section titled “Session Management”Sessions enable resuming conversations and forking agent state.
Session Concepts
Section titled “Session Concepts”- Session ID: Unique identifier from the Claude SDK for conversation context
- Resume: Continue a previous conversation with full context
- Fork: Branch from a previous state to explore alternatives
Resume Flow
Section titled “Resume Flow”Resume continues the exact conversation:
Job A (creates session) │ ▼Job B (resume from A) → Continues with full context │ ▼Job C (resume from B) → Continues with full contextUsage:
const result = await runner.execute({ agent: myAgent, prompt: "Continue from where we left off", stateDir: ".herdctl", resume: "session-id-from-previous-job"});Fork Flow
Section titled “Fork Flow”Fork branches from a point in history:
Job A (creates session) │ ├─► Job B (fork from A) → New branch with A's context │ └─► Job C (fork from A) → Another branch with A's contextUsage:
const result = await runner.execute({ agent: myAgent, prompt: "Try a different approach", stateDir: ".herdctl", fork: "session-id-to-fork-from"});Session Storage
Section titled “Session Storage”Session info is persisted in .herdctl/sessions/<agent-name>.json:
{ "agent_name": "bragdoc-coder", "session_id": "claude-session-xyz789", "created_at": "2024-01-19T08:00:00Z", "last_used_at": "2024-01-19T10:05:00Z", "job_count": 15, "mode": "autonomous"}When to Use
Section titled “When to Use”| Scenario | Use |
|---|---|
| Continue a task | resume with previous session ID |
| Try alternative approaches | fork from a checkpoint |
| Start fresh | Neither (creates new session) |
Output Streaming
Section titled “Output Streaming”The runner streams output in real-time using JSONL (newline-delimited JSON).
JSONL Format
Section titled “JSONL Format”Each line is a complete, self-contained JSON object:
{"type":"system","subtype":"init","timestamp":"2024-01-19T09:00:00Z"}{"type":"assistant","content":"Starting analysis...","partial":false,"timestamp":"2024-01-19T09:00:01Z"}{"type":"tool_use","tool_name":"Bash","tool_use_id":"toolu_123","input":"ls -la","timestamp":"2024-01-19T09:00:02Z"}{"type":"tool_result","tool_use_id":"toolu_123","result":"total 42...","success":true,"timestamp":"2024-01-19T09:00:03Z"}Message Types
Section titled “Message Types”The runner outputs five message types:
system
Section titled “system”Session lifecycle events:
{ "type": "system", "subtype": "init", "content": "Session initialized", "timestamp": "2024-01-19T09:00:00Z"}Subtypes: init, end, complete
assistant
Section titled “assistant”Claude’s text responses:
{ "type": "assistant", "content": "I'll analyze the codebase...", "partial": false, "usage": { "input_tokens": 1500, "output_tokens": 200 }, "timestamp": "2024-01-19T09:00:01Z"}partial: True for streaming chunks, false for complete messagesusage: Token counts (when available)
tool_use
Section titled “tool_use”Tool invocations by the agent:
{ "type": "tool_use", "tool_name": "Bash", "tool_use_id": "toolu_abc123", "input": "git status", "timestamp": "2024-01-19T09:00:02Z"}tool_result
Section titled “tool_result”Results from tool execution:
{ "type": "tool_result", "tool_use_id": "toolu_abc123", "result": "On branch main\nNothing to commit", "success": true, "error": null, "timestamp": "2024-01-19T09:00:05Z"}Error events:
{ "type": "error", "message": "API rate limit exceeded", "code": "RATE_LIMIT", "stack": "...", "timestamp": "2024-01-19T09:00:05Z"}Reading Output
Section titled “Reading Output”Stream output in real-time using the async generator:
import { readJobOutput } from '@herdctl/core';
// Memory-efficient streaming readfor await (const message of readJobOutput(jobsDir, jobId)) { console.log(message.type, message.content || message.tool_name);}Or tail the file directly:
tail -f .herdctl/jobs/job-2024-01-19-abc123.jsonl | jq .Error Handling
Section titled “Error Handling”The runner provides structured error handling with detailed diagnostics.
Error Hierarchy
Section titled “Error Hierarchy”RunnerError (base)├── SDKInitializationError│ └── Missing API key, network issues├── SDKStreamingError│ └── Rate limits, connection drops└── MalformedResponseError └── Invalid SDK message formatError Classification
Section titled “Error Classification”Errors are classified to determine the appropriate exit reason:
| Exit Reason | Trigger |
|---|---|
success | Job completed normally |
error | Unrecoverable error |
timeout | Execution time exceeded |
cancelled | User or system cancellation |
max_turns | Reached maximum conversation turns |
Error Detection
Section titled “Error Detection”The runner detects common error patterns:
// Missing API keyif (error.isMissingApiKey()) { // Prompt user to set ANTHROPIC_API_KEY}
// Rate limitingif (error.isRateLimited()) { // Implement backoff or wait}
// Network issuesif (error.isNetworkError()) { // Check connectivity}
// Recoverable errorsif (error.isRecoverable()) { // Can retry the operation}Troubleshooting Guide
Section titled “Troubleshooting Guide””Missing API Key” Errors
Section titled “”Missing API Key” Errors”SDKInitializationError: Missing or invalid API keySolution: Set your Anthropic API key:
export ANTHROPIC_API_KEY=sk-ant-...Rate Limit Errors
Section titled “Rate Limit Errors”SDKStreamingError: Rate limit exceededSolutions:
- Wait and retry (the error includes retry-after when available)
- Reduce concurrent agent runs
- Use a higher-tier API plan
Connection Errors
Section titled “Connection Errors”SDKStreamingError: Connection refused (ECONNREFUSED)Solutions:
- Check network connectivity
- Verify MCP server URLs are accessible
- Check firewall rules
Malformed Response Errors
Section titled “Malformed Response Errors”MalformedResponseError: Invalid message formatThis usually indicates an SDK version mismatch or API changes. The runner logs these but continues processing other messages.
Error Recovery Patterns
Section titled “Error Recovery Patterns”Automatic Retry (Not Implemented Yet)
Section titled “Automatic Retry (Not Implemented Yet)”The runner currently does not retry failed operations. For critical workflows, implement retry logic at the orchestration layer:
async function runWithRetry(options: RunnerOptions, maxRetries = 3) { for (let attempt = 1; attempt <= maxRetries; attempt++) { const result = await runner.execute(options);
if (result.success) return result;
if (result.error instanceof SDKStreamingError && result.error.isRecoverable() && attempt < maxRetries) { await sleep(1000 * attempt); // Exponential backoff continue; }
throw result.error; }}Graceful Degradation
Section titled “Graceful Degradation”Handle partial failures gracefully:
const result = await runner.execute(options);
if (!result.success && result.errorDetails?.code === 'RATE_LIMIT') { // Save progress and schedule retry await scheduleRetry(result.jobId, result.sessionId);}Runner Result
Section titled “Runner Result”The runner returns a structured result:
interface RunnerResult { success: boolean; // Whether the run completed successfully jobId: string; // The job ID for this run sessionId?: string; // Session ID for resume/fork summary?: string; // Brief summary of accomplishments error?: Error; // Error if run failed errorDetails?: { // Detailed error info code: string; message: string; recoverable: boolean; }; durationSeconds?: number; // Total execution time}Related Documentation
Section titled “Related Documentation”- State Management - How state is persisted
- Sessions - Session configuration and lifecycle
- Jobs - Job properties and status
- Permissions - Detailed permission configuration
- MCP Servers - MCP server setup guide