Docker Container Runtime
The Docker container runtime enables herdctl to execute Claude Code agents inside isolated Docker containers rather than directly on the host. It provides filesystem isolation, controlled environment variable passing, resource limits, and a consistent execution environment across deployments.
This page covers the internal architecture of the Docker runtime. For the runner system as a whole, see Agent Execution Engine. For the overall system design, see System Architecture Overview.
Decorator Pattern
Section titled “Decorator Pattern”ContainerRunner implements the decorator pattern. It wraps any base runtime (SDKRuntime or CLIRuntime) and transparently redirects execution into a Docker container. The RuntimeFactory composes this automatically when an agent’s configuration has docker.enabled: true:
agent.runtime = "sdk" + docker.enabled ──► ContainerRunner(SDKRuntime)agent.runtime = "cli" + docker.enabled ──► ContainerRunner(CLIRuntime)From the JobExecutor’s perspective, a ContainerRunner is just another RuntimeInterface. The same execute() method returns the same AsyncIterable<SDKMessage> stream — the Docker layer is invisible to callers.
// RuntimeFactory.create() handles this composition automaticallyconst runtime = RuntimeFactory.create(agent, { stateDir });
// Whether this is SDKRuntime, CLIRuntime, or ContainerRunner(either),// the interface is identical:for await (const message of runtime.execute(options)) { // process messages}How Wrapping Works
Section titled “How Wrapping Works”ContainerRunner does not call the wrapped runtime’s execute() method directly. Instead, it re-implements the execution strategy for each runtime type inside a Docker container:
| Wrapped Runtime | Docker Execution Strategy |
|---|---|
| CLIRuntime | Spawns claude via docker exec with a custom process spawner. Session files are written inside the container but mounted to the host so the CLI session watcher can observe them. |
| SDKRuntime | Serializes SDK options to JSON, passes them via the HERDCTL_SDK_OPTIONS environment variable, and runs docker-sdk-wrapper.js inside the container via docker exec. The wrapper script calls the SDK’s query() function and streams messages as JSONL to stdout. |
Docker Image
Section titled “Docker Image”The runtime image (herdctl/runtime:latest) is built from the Dockerfile at the repository root. It provides a complete execution environment for Claude Code agents.
Contents
Section titled “Contents”| Component | Version | Purpose |
|---|---|---|
| Node.js | 22 (slim) | JavaScript runtime for Claude CLI and SDK |
| Claude CLI | @anthropic-ai/claude-code | Official Anthropic CLI for agent execution |
| Claude Agent SDK | @anthropic-ai/claude-agent-sdk | SDK for programmatic agent execution |
| GitHub CLI | gh | GitHub API operations (issues, PRs, releases) |
| Git | System package | Version control operations |
| docker-sdk-wrapper.js | Bundled from source | Bridge script for SDK runtime in Docker |
Key Directories
Section titled “Key Directories”| Path | Purpose |
|---|---|
/workspace | Working directory, world-writable (mount point for host project) |
/home/claude/.claude/projects/ | Claude CLI configuration and session data |
/usr/local/lib/docker-sdk-wrapper.js | SDK wrapper script for Docker execution |
Entrypoint
Section titled “Entrypoint”The image includes an entrypoint script that configures Git authentication when GITHUB_TOKEN is present:
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]The entrypoint sets git config --global url."https://x-access-token:${GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/", enabling agents to clone, fetch, and push to GitHub repositories without manual credential setup.
Keep-Alive Process
Section titled “Keep-Alive Process”The container runs sleep infinity (or tail -f /dev/null in older builds) as its main process. This keeps the container alive so that herdctl can execute commands via docker exec. The container itself does no work — it is an execution shell that waits for instructions.
Building the Image
Section titled “Building the Image”docker build -t herdctl/runtime:latest -f Dockerfile .The image must be built locally before Docker runtime can be used. It is not published to a public registry.
Container Lifecycle
Section titled “Container Lifecycle”ContainerManager handles all Docker API interactions through the dockerode library. The lifecycle follows a create-start-execute-stop-remove pattern.
Creation
Section titled “Creation”When herdctl needs to execute an agent with Docker enabled, ContainerManager.getOrCreateContainer() either reuses an existing container (persistent mode) or creates a new one (ephemeral mode).
Container names follow the pattern herdctl-<agent-name>-<timestamp>, for example herdctl-assistant-1708012345678.
Persistent vs Ephemeral Containers
Section titled “Persistent vs Ephemeral Containers”| Mode | Config | Behavior |
|---|---|---|
| Ephemeral | ephemeral: true (default) | Fresh container per job. AutoRemove: true cleans up after stop. No accumulated state between executions. |
| Persistent | ephemeral: false | Container is reused across jobs for the same agent. Faster startup (no container creation overhead). State persists between executions. |
Execution
Section titled “Execution”Commands run inside the container via docker exec:
- CLI runtime:
docker exec <container> sh -c 'cd /workspace && printf %s "prompt" | claude <args>' - SDK runtime:
docker exec <container> bash -l -c 'export HERDCTL_SDK_OPTIONS=... && node /usr/local/lib/docker-sdk-wrapper.js'
The SDK runtime uses bash -l (login shell) to ensure the full environment (including PATH) is available.
Cleanup
Section titled “Cleanup”After execution completes:
- Ephemeral containers: stopped immediately, triggering
AutoRemovefor automatic cleanup. - All containers:
cleanupOldContainers()runs, removing the oldest containers when the count exceedsmaxContainers(default: 5) per agent. - Forced removal: containers that fail to stop gracefully are removed with
force: true.
Network Configuration
Section titled “Network Configuration”Network mode controls how agent containers connect to the outside world.
| Mode | Config Value | Behavior |
|---|---|---|
| Bridge | bridge (default) | Standard Docker networking with NAT. The container has its own network namespace but can reach the internet through Docker’s bridge network. |
| Host | host | Container shares the host’s network namespace. Use when agents need to access services bound to localhost on the host. |
| Custom | Via host_config.NetworkMode | Any Docker network name (e.g., herdctl-net). Used in production deployments where herdctl and agent containers share a named network for DNS-based service discovery. |
Custom Networks in Production
Section titled “Custom Networks in Production”In production deployments, herdctl and its agent containers typically share a Docker bridge network (e.g., herdctl-net). This enables DNS-based service discovery — agents can reach MCP servers by container name:
defaults: docker: network: bridge # passes schema validation host_config: NetworkMode: herdctl-net # actual network overrideThe schema currently validates against a fixed enum of none, bridge, and host. Custom network names are passed via the host_config.NetworkMode override to bypass this validation.
Volume Mounts
Section titled “Volume Mounts”buildContainerMounts() constructs the volume mount list for each container. Three categories of mounts are created automatically:
Automatic Mounts
Section titled “Automatic Mounts”| Mount | Host Path | Container Path | Mode | Purpose |
|---|---|---|---|---|
| Workspace | Agent’s working_directory | /workspace | rw (configurable) | The project directory the agent works in |
| Docker sessions | <stateDir>/docker-sessions | /home/claude/.claude/projects/-workspace | rw | Claude CLI session files, mounted so the host can watch session changes |
Custom Mounts
Section titled “Custom Mounts”Additional volumes are specified in the fleet-level Docker configuration:
defaults: docker: volumes: - "/host/data:/container/data:ro" - "/host/config:/container/config:rw" workspace_mode: rw # or "ro" for read-only workspaceEach volume string is parsed into a PathMapping with hostPath, containerPath, and mode (ro or rw).
Credentials File Mount
Section titled “Credentials File Mount”For OAuth token management, the host’s ~/.claude/.credentials.json is bind-mounted (read-write) into the herdctl container. This allows buildContainerEnv() to read and refresh tokens without restarting the herdctl process. See OAuth Token Management below.
Security Model
Section titled “Security Model”Docker containers provide a different security model than Claude Code’s native sandboxing. The two approaches are complementary, not mutually exclusive.
Container Security Hardening
Section titled “Container Security Hardening”Every container is created with the following security settings:
{ SecurityOpt: ["no-new-privileges:true"], CapDrop: ["ALL"], ReadonlyRootfs: false, // Claude needs to write temp files AutoRemove: config.ephemeral,}| Setting | Effect |
|---|---|
no-new-privileges | Prevents processes from gaining additional privileges via setuid, setgid, or filesystem capabilities |
CapDrop: ALL | Drops all Linux capabilities. The container cannot perform privileged operations like mounting filesystems, changing network config, or loading kernel modules. |
User: UID:GID | Container runs as a non-root user. By default, matches the host user’s UID/GID via process.getuid() / process.getgid(). |
Resource Limits
Section titled “Resource Limits”| Resource | Config | Docker API | Default |
|---|---|---|---|
| Memory | memory: "2g" | Memory + MemorySwap (no swap) | 2 GB |
| CPU shares | cpu_shares: 512 | CpuShares | Unlimited |
| CPU hard limit | cpu_period + cpu_quota | CpuPeriod + CpuQuota | Unlimited |
| Max processes | pids_limit: 100 | PidsLimit | Unlimited |
Memory swap is set equal to the memory limit, effectively disabling swap. This prevents containers from consuming host swap space.
Docker vs Native Sandboxing
Section titled “Docker vs Native Sandboxing”Claude Code includes native sandboxing via bubblewrap (Linux) and Seatbelt (macOS). Docker provides a different set of protections:
| Security Property | Native Sandbox | Docker |
|---|---|---|
| Filesystem isolation | Access controls on the same filesystem | Separate root filesystem; only explicit mounts are visible |
| Network control | Userspace domain-filtering proxy | Kernel-level network namespaces |
| Environment variables | No filtering; full host environment accessible | Fresh environment; only explicitly passed variables |
| Resource limits | None | cgroups (memory, CPU, PID limits) |
| Ephemeral execution | Persistent state across runs | Fresh container per job (ephemeral mode) |
| Process isolation | Partial (platform-dependent) | Full PID namespace isolation |
Docker’s primary advantages are environment variable isolation (the container has no access to host environment variables unless explicitly passed), resource limits (preventing runaway memory or CPU usage), and true filesystem isolation (host files do not exist inside the container unless mounted).
Native sandboxing provides tool-level permission control that Docker does not — the permissionMode and allowedTools / deniedTools settings operate at the Claude Code level regardless of whether Docker is used.
hostConfigOverride
Section titled “hostConfigOverride”The host_config field in fleet-level Docker configuration passes raw dockerode HostConfig options directly to the Docker API. This can override security settings:
defaults: docker: host_config: NetworkMode: herdctl-net ShmSize: 67108864This field is intentionally restricted to fleet-level configuration only. Agent-level configs use a strict schema (AgentDockerSchema) that rejects unknown fields, preventing untrusted agent configs from weakening container security.
Environment Variable Passing
Section titled “Environment Variable Passing”buildContainerEnv() constructs the environment variable array for each container. Variables are passed as KEY=value strings.
Automatically Passed Variables
Section titled “Automatically Passed Variables”| Variable | Source | Condition |
|---|---|---|
ANTHROPIC_API_KEY | process.env | If set in herdctl’s environment |
CLAUDE_CODE_OAUTH_TOKEN | Credentials file or process.env | OAuth access token |
CLAUDE_REFRESH_TOKEN | Credentials file or process.env | OAuth refresh token |
CLAUDE_EXPIRES_AT | Credentials file or process.env | Token expiration timestamp |
TERM | Hardcoded | Always xterm-256color |
HOME | Hardcoded | Always /home/claude |
Custom Variables
Section titled “Custom Variables”Additional variables from the Docker config’s env field:
defaults: docker: env: GITHUB_TOKEN: "${GITHUB_TOKEN}" MY_CUSTOM_VAR: "some-value"Values support ${VAR} interpolation from the host environment.
Git Authentication
Section titled “Git Authentication”The Docker image automatically configures Git HTTPS authentication when GITHUB_TOKEN is present in the container’s environment. The entrypoint script runs:
git config --global url."https://x-access-token:${GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/"This rewrites all https://github.com/ URLs to include the token, enabling git clone, git fetch, and git push operations without interactive authentication.
To pass the token, include it in the Docker environment configuration:
defaults: docker: env: GITHUB_TOKEN: "${GITHUB_TOKEN}"MCP HTTP Bridge
Section titled “MCP HTTP Bridge”When MCP servers are injected at runtime (for example, the Slack file sender), the SDKRuntime and ContainerRunner handle them differently:
- SDKRuntime (non-Docker): converts
InjectedMcpServerDefto an in-process MCP server using the Claude Agent SDK’stool()andcreateSdkMcpServer()functions. The server runs in the same process as the SDK. - ContainerRunner (Docker): in-process function closures cannot be serialized into a Docker container. The solution is the MCP HTTP bridge.
The Problem
Section titled “The Problem”An InjectedMcpServerDef contains handler functions — JavaScript closures that capture state from their enclosing scope (like a Slack API client). These closures cannot be serialized to JSON and passed into a Docker container.
The Solution
Section titled “The Solution”ContainerRunner starts an HTTP server on the herdctl side that implements the MCP Streamable HTTP transport (JSON-RPC 2.0 over POST). The agent container connects to this server via Docker network DNS:
herdctl container agent container┌─────────────────────┐ ┌─────────────────────┐│ MCP HTTP Bridge │◄── HTTP ────►│ Claude Agent SDK ││ port: <random> │ │ MCP client ││ host: 0.0.0.0 │ │ url: http://herdctl ││ │ │ :<port>/mcp ││ routes to in-process│ └─────────────────────┘│ handler functions │└─────────────────────┘Bridge Lifecycle
Section titled “Bridge Lifecycle”- Start:
startMcpHttpBridge(def)creates an HTTP server bound to0.0.0.0:0(random available port). - Inject: The bridge URL (
http://herdctl:<port>/mcp) is added tosdkOptions.mcpServersas an HTTP-type MCP server config. - Execute: The agent container calls the bridge via HTTP during execution. The bridge translates tool calls to the in-process handler functions.
- Cleanup: All bridges are closed in a
finallyblock after execution completes, regardless of success or failure.
Supported MCP Methods
Section titled “Supported MCP Methods”The bridge implements a minimal subset of the MCP protocol:
| Method | Behavior |
|---|---|
initialize | Returns server info and capabilities |
notifications/initialized | Returns 204 No Content (JSON-RPC notification) |
tools/list | Returns tool definitions from the InjectedMcpServerDef |
tools/call | Executes the tool’s handler function with path translation |
ping | Returns empty result |
Docker Path Translation
Section titled “Docker Path Translation”The bridge includes path translation for the sibling container model. When the agent calls a tool with a file path like /workspace/report.pdf, the bridge strips the /workspace/ prefix before passing it to the handler. The handler runs on the host side where paths are relative to the working directory, not the container’s mount point.
Tool Permission Auto-Addition
Section titled “Tool Permission Auto-Addition”When injected MCP servers are present and the agent has an explicit allowedTools list, ContainerRunner automatically adds mcp__<name>__* patterns for each injected server. Without this, agents with restrictive tool lists would be unable to call injected tools — the allowedTools filter would block them.
OAuth Token Management
Section titled “OAuth Token Management”Claude OAuth access tokens have an 8-hour TTL. For long-running herdctl deployments, tokens must be refreshed automatically to avoid agent authentication failures.
Two-Layer Architecture
Section titled “Two-Layer Architecture”The token management system uses two complementary strategies:
Layer 1: Proactive Refresh
Section titled “Layer 1: Proactive Refresh”On every agent spawn, buildContainerEnv() reads the OAuth credentials and checks expiry:
- Reads
~/.claude/.credentials.json(bind-mounted from the host). - Checks if the access token expires within a 5-minute buffer.
- If expired or expiring, calls the Claude OAuth refresh endpoint.
- Writes the refreshed tokens back to the credentials file (persisted via bind mount).
- Passes the fresh token as environment variables to the agent container.
| Parameter | Value |
|---|---|
| Refresh endpoint | POST https://console.anthropic.com/v1/oauth/token |
| Grant type | refresh_token |
| Client ID | 9d1c250a-e61b-44d9-88ed-5944d1962f5e (Claude Code CLI public client) |
| Access token TTL | 8 hours |
| Refresh buffer | 5 minutes before expiry |
| Refresh token rotation | Each refresh returns a new refresh token; the old one is invalidated |
Layer 2: Reactive Retry
Section titled “Layer 2: Reactive Retry”If an agent session runs longer than 8 hours and the token expires mid-execution:
- The Claude SDK fails with an authentication error.
isTokenExpiredError()in the job executor detects the error pattern.- The job is retried automatically (one retry maximum).
- The retry creates a new container, triggering
buildContainerEnv()which refreshes the token. - The agent resumes with a fresh token in a new session.
Backwards Compatibility
Section titled “Backwards Compatibility”If the credentials file is not available (no bind mount), buildContainerEnv() falls back to reading from process.env. This preserves compatibility with deployments that pass static tokens via environment variables.
Edge Cases
Section titled “Edge Cases”| Scenario | Behavior |
|---|---|
| Token valid | Read from file, pass as env vars. No refresh. |
| Token expired, refresh succeeds | Refresh, write back to file, pass fresh token. |
| Token expired, refresh fails | Pass the expired token anyway (agent will fail). Fall back to env vars if file is unreadable. |
| Agent runs >8h, token expires mid-session | Auth error detected, job retried with fresh token. |
| Refresh token itself expired | Requires manual re-auth on host (claude interactive login), then restart herdctl. |
| Multiple concurrent agent spawns | Each reads the file independently. First to refresh writes back; others read the already-refreshed file. No locking needed since refresh is idempotent. |
Sibling Container Pattern
Section titled “Sibling Container Pattern”herdctl uses the sibling container pattern, not Docker-in-Docker (DinD). The herdctl container mounts the Docker socket (/var/run/docker.sock) and calls the Docker API to spawn agent containers as peers on the same Docker daemon:
Docker daemon (host)├── herdctl container (mounts /var/run/docker.sock)├── agent-1 container (spawned by herdctl via Docker API)├── agent-2 container (spawned by herdctl via Docker API)└── mcp-server container (independently deployed)Agent containers are siblings of the herdctl container, not children nested inside it. This is important because:
- No privileged mode required — Docker-in-Docker requires
--privileged, which weakens security. Sibling containers need only the Docker socket mount. - Shared network — All containers can join the same Docker network for DNS-based service discovery.
- Resource visibility — The host Docker daemon manages all container resources directly.
Path Mapping Constraint
Section titled “Path Mapping Constraint”The sibling container pattern introduces a critical constraint: all volume mount paths must be real host paths, not paths inside the herdctl container.
When herdctl calls the Docker API to create an agent container, the Docker daemon resolves mount source paths relative to the host filesystem, not relative to the herdctl container. This means:
working_directoryin agent config must be the host path (e.g.,/home/dev/projects/myapp), not a path inside the herdctl container (e.g.,/workspace).- The state directory must use host paths, mounted at matching paths in both the herdctl and agent containers.
- Docker named volumes do not work for state because the Docker daemon interprets mount sources as host paths.
# Correct: host pathagents: - name: my-agent working_directory: /home/dev/projects/myapp
# Incorrect: container-internal path (Docker daemon can't resolve this)agents: - name: my-agent working_directory: /workspaceConfiguration Reference
Section titled “Configuration Reference”Agent-Level Options (Safe)
Section titled “Agent-Level Options (Safe)”These options can be set in agent config files. The schema uses strict() mode to reject unknown fields at the agent level.
docker: enabled: true ephemeral: false memory: 2g cpu_shares: 512 pids_limit: 100 max_containers: 5 workspace_mode: rw tmpfs: - "/tmp" labels: team: backendFleet-Level Options (Full)
Section titled “Fleet-Level Options (Full)”These options are set in herdctl.yaml under defaults.docker. They include all agent-level options plus security-sensitive options:
defaults: docker: enabled: true image: herdctl/runtime:latest network: bridge memory: 2g user: "1000:1000" ephemeral: false volumes: - "/host/data:/data:ro" env: GITHUB_TOKEN: "${GITHUB_TOKEN}" ports: - "8080:80" host_config: NetworkMode: herdctl-netConfiguration Hierarchy
Section titled “Configuration Hierarchy”Docker configuration follows the same merge strategy as other agent settings: agent-level values override fleet-level defaults. However, security-sensitive fields (image, network, volumes, env, user, ports, host_config) are only accepted at the fleet level.
Source Code Layout
Section titled “Source Code Layout”| File | Purpose |
|---|---|
Dockerfile | Runtime image definition |
packages/core/src/runner/runtime/container-runner.ts | ContainerRunner decorator — execution delegation for CLI and SDK runtimes in Docker |
packages/core/src/runner/runtime/container-manager.ts | ContainerManager — container lifecycle (create, start, stop, remove, cleanup), buildContainerMounts(), buildContainerEnv(), OAuth token refresh |
packages/core/src/runner/runtime/docker-config.ts | DockerConfig type, resolveDockerConfig(), parsers for memory, ports, volumes, tmpfs |
packages/core/src/runner/runtime/mcp-http-bridge.ts | startMcpHttpBridge() — HTTP server implementing MCP Streamable HTTP transport for Docker |
packages/core/src/runner/runtime/docker-sdk-wrapper.js | In-container wrapper script that runs the Claude Agent SDK and streams JSONL to stdout |
packages/core/src/runner/runtime/factory.ts | RuntimeFactory — composes ContainerRunner around base runtimes when Docker is enabled |
packages/core/src/runner/runtime/interface.ts | RuntimeInterface and RuntimeExecuteOptions types |
packages/core/src/runner/types.ts | InjectedMcpServerDef, InjectedMcpToolDef, McpToolCallResult types |
packages/core/src/config/schema.ts | AgentDockerSchema, FleetDockerSchema — Zod validation schemas |
packages/core/src/state/session-validation.ts | isTokenExpiredError() — detects OAuth token expiry errors for retry |
Related Pages
Section titled “Related Pages”- Agent Execution Engine — Runner module, runtime selection, job execution lifecycle
- System Architecture Overview — FleetManager, packages, design principles
- State Persistence —
.herdctl/directory, session storage - Chat Infrastructure — How chat integrations inject MCP servers
- Slack Connector — Production deployment architecture using Docker