Skip to content

Docker Container Runtime

The Docker container runtime enables herdctl to execute Claude Code agents inside isolated Docker containers rather than directly on the host. It provides filesystem isolation, controlled environment variable passing, resource limits, and a consistent execution environment across deployments.

This page covers the internal architecture of the Docker runtime. For the runner system as a whole, see Agent Execution Engine. For the overall system design, see System Architecture Overview.

Docker container architecture showing host process, container internals, volume mounts, network bridge, and MCP HTTP bridge

ContainerRunner implements the decorator pattern. It wraps any base runtime (SDKRuntime or CLIRuntime) and transparently redirects execution into a Docker container. The RuntimeFactory composes this automatically when an agent’s configuration has docker.enabled: true:

agent.runtime = "sdk" + docker.enabled ──► ContainerRunner(SDKRuntime)
agent.runtime = "cli" + docker.enabled ──► ContainerRunner(CLIRuntime)

From the JobExecutor’s perspective, a ContainerRunner is just another RuntimeInterface. The same execute() method returns the same AsyncIterable<SDKMessage> stream — the Docker layer is invisible to callers.

// RuntimeFactory.create() handles this composition automatically
const runtime = RuntimeFactory.create(agent, { stateDir });
// Whether this is SDKRuntime, CLIRuntime, or ContainerRunner(either),
// the interface is identical:
for await (const message of runtime.execute(options)) {
// process messages
}

ContainerRunner decorator pattern showing how it wraps base runtimes with container lifecycle management

ContainerRunner does not call the wrapped runtime’s execute() method directly. Instead, it re-implements the execution strategy for each runtime type inside a Docker container:

Wrapped RuntimeDocker Execution Strategy
CLIRuntimeSpawns claude via docker exec with a custom process spawner. Session files are written inside the container but mounted to the host so the CLI session watcher can observe them.
SDKRuntimeSerializes SDK options to JSON, passes them via the HERDCTL_SDK_OPTIONS environment variable, and runs docker-sdk-wrapper.js inside the container via docker exec. The wrapper script calls the SDK’s query() function and streams messages as JSONL to stdout.

The runtime image (herdctl/runtime:latest) is built from the Dockerfile at the repository root. It provides a complete execution environment for Claude Code agents.

ComponentVersionPurpose
Node.js22 (slim)JavaScript runtime for Claude CLI and SDK
Claude CLI@anthropic-ai/claude-codeOfficial Anthropic CLI for agent execution
Claude Agent SDK@anthropic-ai/claude-agent-sdkSDK for programmatic agent execution
GitHub CLIghGitHub API operations (issues, PRs, releases)
GitSystem packageVersion control operations
docker-sdk-wrapper.jsBundled from sourceBridge script for SDK runtime in Docker
PathPurpose
/workspaceWorking directory, world-writable (mount point for host project)
/home/claude/.claude/projects/Claude CLI configuration and session data
/usr/local/lib/docker-sdk-wrapper.jsSDK wrapper script for Docker execution

The image includes an entrypoint script that configures Git authentication when GITHUB_TOKEN is present:

ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]

The entrypoint sets git config --global url."https://x-access-token:${GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/", enabling agents to clone, fetch, and push to GitHub repositories without manual credential setup.

The container runs sleep infinity (or tail -f /dev/null in older builds) as its main process. This keeps the container alive so that herdctl can execute commands via docker exec. The container itself does no work — it is an execution shell that waits for instructions.

Terminal window
docker build -t herdctl/runtime:latest -f Dockerfile .

The image must be built locally before Docker runtime can be used. It is not published to a public registry.

ContainerManager handles all Docker API interactions through the dockerode library. The lifecycle follows a create-start-execute-stop-remove pattern.

When herdctl needs to execute an agent with Docker enabled, ContainerManager.getOrCreateContainer() either reuses an existing container (persistent mode) or creates a new one (ephemeral mode).

Container names follow the pattern herdctl-<agent-name>-<timestamp>, for example herdctl-assistant-1708012345678.

ModeConfigBehavior
Ephemeralephemeral: true (default)Fresh container per job. AutoRemove: true cleans up after stop. No accumulated state between executions.
Persistentephemeral: falseContainer is reused across jobs for the same agent. Faster startup (no container creation overhead). State persists between executions.

Commands run inside the container via docker exec:

  • CLI runtime: docker exec <container> sh -c 'cd /workspace && printf %s "prompt" | claude <args>'
  • SDK runtime: docker exec <container> bash -l -c 'export HERDCTL_SDK_OPTIONS=... && node /usr/local/lib/docker-sdk-wrapper.js'

The SDK runtime uses bash -l (login shell) to ensure the full environment (including PATH) is available.

After execution completes:

  1. Ephemeral containers: stopped immediately, triggering AutoRemove for automatic cleanup.
  2. All containers: cleanupOldContainers() runs, removing the oldest containers when the count exceeds maxContainers (default: 5) per agent.
  3. Forced removal: containers that fail to stop gracefully are removed with force: true.

Network mode controls how agent containers connect to the outside world.

Network modes comparison showing bridge mode with NAT isolation versus host mode with shared network namespace

ModeConfig ValueBehavior
Bridgebridge (default)Standard Docker networking with NAT. The container has its own network namespace but can reach the internet through Docker’s bridge network.
HosthostContainer shares the host’s network namespace. Use when agents need to access services bound to localhost on the host.
CustomVia host_config.NetworkModeAny Docker network name (e.g., herdctl-net). Used in production deployments where herdctl and agent containers share a named network for DNS-based service discovery.

In production deployments, herdctl and its agent containers typically share a Docker bridge network (e.g., herdctl-net). This enables DNS-based service discovery — agents can reach MCP servers by container name:

defaults:
docker:
network: bridge # passes schema validation
host_config:
NetworkMode: herdctl-net # actual network override

The schema currently validates against a fixed enum of none, bridge, and host. Custom network names are passed via the host_config.NetworkMode override to bypass this validation.

buildContainerMounts() constructs the volume mount list for each container. Three categories of mounts are created automatically:

MountHost PathContainer PathModePurpose
WorkspaceAgent’s working_directory/workspacerw (configurable)The project directory the agent works in
Docker sessions<stateDir>/docker-sessions/home/claude/.claude/projects/-workspacerwClaude CLI session files, mounted so the host can watch session changes

Additional volumes are specified in the fleet-level Docker configuration:

defaults:
docker:
volumes:
- "/host/data:/container/data:ro"
- "/host/config:/container/config:rw"
workspace_mode: rw # or "ro" for read-only workspace

Each volume string is parsed into a PathMapping with hostPath, containerPath, and mode (ro or rw).

For OAuth token management, the host’s ~/.claude/.credentials.json is bind-mounted (read-write) into the herdctl container. This allows buildContainerEnv() to read and refresh tokens without restarting the herdctl process. See OAuth Token Management below.

Docker containers provide a different security model than Claude Code’s native sandboxing. The two approaches are complementary, not mutually exclusive.

Every container is created with the following security settings:

{
SecurityOpt: ["no-new-privileges:true"],
CapDrop: ["ALL"],
ReadonlyRootfs: false, // Claude needs to write temp files
AutoRemove: config.ephemeral,
}
SettingEffect
no-new-privilegesPrevents processes from gaining additional privileges via setuid, setgid, or filesystem capabilities
CapDrop: ALLDrops all Linux capabilities. The container cannot perform privileged operations like mounting filesystems, changing network config, or loading kernel modules.
User: UID:GIDContainer runs as a non-root user. By default, matches the host user’s UID/GID via process.getuid() / process.getgid().
ResourceConfigDocker APIDefault
Memorymemory: "2g"Memory + MemorySwap (no swap)2 GB
CPU sharescpu_shares: 512CpuSharesUnlimited
CPU hard limitcpu_period + cpu_quotaCpuPeriod + CpuQuotaUnlimited
Max processespids_limit: 100PidsLimitUnlimited

Memory swap is set equal to the memory limit, effectively disabling swap. This prevents containers from consuming host swap space.

Claude Code includes native sandboxing via bubblewrap (Linux) and Seatbelt (macOS). Docker provides a different set of protections:

Security PropertyNative SandboxDocker
Filesystem isolationAccess controls on the same filesystemSeparate root filesystem; only explicit mounts are visible
Network controlUserspace domain-filtering proxyKernel-level network namespaces
Environment variablesNo filtering; full host environment accessibleFresh environment; only explicitly passed variables
Resource limitsNonecgroups (memory, CPU, PID limits)
Ephemeral executionPersistent state across runsFresh container per job (ephemeral mode)
Process isolationPartial (platform-dependent)Full PID namespace isolation

Docker’s primary advantages are environment variable isolation (the container has no access to host environment variables unless explicitly passed), resource limits (preventing runaway memory or CPU usage), and true filesystem isolation (host files do not exist inside the container unless mounted).

Native sandboxing provides tool-level permission control that Docker does not — the permissionMode and allowedTools / deniedTools settings operate at the Claude Code level regardless of whether Docker is used.

The host_config field in fleet-level Docker configuration passes raw dockerode HostConfig options directly to the Docker API. This can override security settings:

defaults:
docker:
host_config:
NetworkMode: herdctl-net
ShmSize: 67108864

This field is intentionally restricted to fleet-level configuration only. Agent-level configs use a strict schema (AgentDockerSchema) that rejects unknown fields, preventing untrusted agent configs from weakening container security.

buildContainerEnv() constructs the environment variable array for each container. Variables are passed as KEY=value strings.

VariableSourceCondition
ANTHROPIC_API_KEYprocess.envIf set in herdctl’s environment
CLAUDE_CODE_OAUTH_TOKENCredentials file or process.envOAuth access token
CLAUDE_REFRESH_TOKENCredentials file or process.envOAuth refresh token
CLAUDE_EXPIRES_ATCredentials file or process.envToken expiration timestamp
TERMHardcodedAlways xterm-256color
HOMEHardcodedAlways /home/claude

Additional variables from the Docker config’s env field:

defaults:
docker:
env:
GITHUB_TOKEN: "${GITHUB_TOKEN}"
MY_CUSTOM_VAR: "some-value"

Values support ${VAR} interpolation from the host environment.

The Docker image automatically configures Git HTTPS authentication when GITHUB_TOKEN is present in the container’s environment. The entrypoint script runs:

Terminal window
git config --global url."https://x-access-token:${GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/"

This rewrites all https://github.com/ URLs to include the token, enabling git clone, git fetch, and git push operations without interactive authentication.

To pass the token, include it in the Docker environment configuration:

defaults:
docker:
env:
GITHUB_TOKEN: "${GITHUB_TOKEN}"

When MCP servers are injected at runtime (for example, the Slack file sender), the SDKRuntime and ContainerRunner handle them differently:

  • SDKRuntime (non-Docker): converts InjectedMcpServerDef to an in-process MCP server using the Claude Agent SDK’s tool() and createSdkMcpServer() functions. The server runs in the same process as the SDK.
  • ContainerRunner (Docker): in-process function closures cannot be serialized into a Docker container. The solution is the MCP HTTP bridge.

An InjectedMcpServerDef contains handler functions — JavaScript closures that capture state from their enclosing scope (like a Slack API client). These closures cannot be serialized to JSON and passed into a Docker container.

ContainerRunner starts an HTTP server on the herdctl side that implements the MCP Streamable HTTP transport (JSON-RPC 2.0 over POST). The agent container connects to this server via Docker network DNS:

herdctl container agent container
┌─────────────────────┐ ┌─────────────────────┐
│ MCP HTTP Bridge │◄── HTTP ────►│ Claude Agent SDK │
│ port: <random> │ │ MCP client │
│ host: 0.0.0.0 │ │ url: http://herdctl │
│ │ │ :<port>/mcp │
│ routes to in-process│ └─────────────────────┘
│ handler functions │
└─────────────────────┘
  1. Start: startMcpHttpBridge(def) creates an HTTP server bound to 0.0.0.0:0 (random available port).
  2. Inject: The bridge URL (http://herdctl:<port>/mcp) is added to sdkOptions.mcpServers as an HTTP-type MCP server config.
  3. Execute: The agent container calls the bridge via HTTP during execution. The bridge translates tool calls to the in-process handler functions.
  4. Cleanup: All bridges are closed in a finally block after execution completes, regardless of success or failure.

The bridge implements a minimal subset of the MCP protocol:

MethodBehavior
initializeReturns server info and capabilities
notifications/initializedReturns 204 No Content (JSON-RPC notification)
tools/listReturns tool definitions from the InjectedMcpServerDef
tools/callExecutes the tool’s handler function with path translation
pingReturns empty result

The bridge includes path translation for the sibling container model. When the agent calls a tool with a file path like /workspace/report.pdf, the bridge strips the /workspace/ prefix before passing it to the handler. The handler runs on the host side where paths are relative to the working directory, not the container’s mount point.

When injected MCP servers are present and the agent has an explicit allowedTools list, ContainerRunner automatically adds mcp__<name>__* patterns for each injected server. Without this, agents with restrictive tool lists would be unable to call injected tools — the allowedTools filter would block them.

Claude OAuth access tokens have an 8-hour TTL. For long-running herdctl deployments, tokens must be refreshed automatically to avoid agent authentication failures.

The token management system uses two complementary strategies:

On every agent spawn, buildContainerEnv() reads the OAuth credentials and checks expiry:

  1. Reads ~/.claude/.credentials.json (bind-mounted from the host).
  2. Checks if the access token expires within a 5-minute buffer.
  3. If expired or expiring, calls the Claude OAuth refresh endpoint.
  4. Writes the refreshed tokens back to the credentials file (persisted via bind mount).
  5. Passes the fresh token as environment variables to the agent container.
ParameterValue
Refresh endpointPOST https://console.anthropic.com/v1/oauth/token
Grant typerefresh_token
Client ID9d1c250a-e61b-44d9-88ed-5944d1962f5e (Claude Code CLI public client)
Access token TTL8 hours
Refresh buffer5 minutes before expiry
Refresh token rotationEach refresh returns a new refresh token; the old one is invalidated

If an agent session runs longer than 8 hours and the token expires mid-execution:

  1. The Claude SDK fails with an authentication error.
  2. isTokenExpiredError() in the job executor detects the error pattern.
  3. The job is retried automatically (one retry maximum).
  4. The retry creates a new container, triggering buildContainerEnv() which refreshes the token.
  5. The agent resumes with a fresh token in a new session.

If the credentials file is not available (no bind mount), buildContainerEnv() falls back to reading from process.env. This preserves compatibility with deployments that pass static tokens via environment variables.

ScenarioBehavior
Token validRead from file, pass as env vars. No refresh.
Token expired, refresh succeedsRefresh, write back to file, pass fresh token.
Token expired, refresh failsPass the expired token anyway (agent will fail). Fall back to env vars if file is unreadable.
Agent runs >8h, token expires mid-sessionAuth error detected, job retried with fresh token.
Refresh token itself expiredRequires manual re-auth on host (claude interactive login), then restart herdctl.
Multiple concurrent agent spawnsEach reads the file independently. First to refresh writes back; others read the already-refreshed file. No locking needed since refresh is idempotent.

herdctl uses the sibling container pattern, not Docker-in-Docker (DinD). The herdctl container mounts the Docker socket (/var/run/docker.sock) and calls the Docker API to spawn agent containers as peers on the same Docker daemon:

Docker daemon (host)
├── herdctl container (mounts /var/run/docker.sock)
├── agent-1 container (spawned by herdctl via Docker API)
├── agent-2 container (spawned by herdctl via Docker API)
└── mcp-server container (independently deployed)

Agent containers are siblings of the herdctl container, not children nested inside it. This is important because:

  1. No privileged mode required — Docker-in-Docker requires --privileged, which weakens security. Sibling containers need only the Docker socket mount.
  2. Shared network — All containers can join the same Docker network for DNS-based service discovery.
  3. Resource visibility — The host Docker daemon manages all container resources directly.

The sibling container pattern introduces a critical constraint: all volume mount paths must be real host paths, not paths inside the herdctl container.

When herdctl calls the Docker API to create an agent container, the Docker daemon resolves mount source paths relative to the host filesystem, not relative to the herdctl container. This means:

  • working_directory in agent config must be the host path (e.g., /home/dev/projects/myapp), not a path inside the herdctl container (e.g., /workspace).
  • The state directory must use host paths, mounted at matching paths in both the herdctl and agent containers.
  • Docker named volumes do not work for state because the Docker daemon interprets mount sources as host paths.
# Correct: host path
agents:
- name: my-agent
working_directory: /home/dev/projects/myapp
# Incorrect: container-internal path (Docker daemon can't resolve this)
agents:
- name: my-agent
working_directory: /workspace

These options can be set in agent config files. The schema uses strict() mode to reject unknown fields at the agent level.

docker:
enabled: true
ephemeral: false
memory: 2g
cpu_shares: 512
pids_limit: 100
max_containers: 5
workspace_mode: rw
tmpfs:
- "/tmp"
labels:
team: backend

These options are set in herdctl.yaml under defaults.docker. They include all agent-level options plus security-sensitive options:

defaults:
docker:
enabled: true
image: herdctl/runtime:latest
network: bridge
memory: 2g
user: "1000:1000"
ephemeral: false
volumes:
- "/host/data:/data:ro"
env:
GITHUB_TOKEN: "${GITHUB_TOKEN}"
ports:
- "8080:80"
host_config:
NetworkMode: herdctl-net

Docker configuration follows the same merge strategy as other agent settings: agent-level values override fleet-level defaults. However, security-sensitive fields (image, network, volumes, env, user, ports, host_config) are only accepted at the fleet level.

FilePurpose
DockerfileRuntime image definition
packages/core/src/runner/runtime/container-runner.tsContainerRunner decorator — execution delegation for CLI and SDK runtimes in Docker
packages/core/src/runner/runtime/container-manager.tsContainerManager — container lifecycle (create, start, stop, remove, cleanup), buildContainerMounts(), buildContainerEnv(), OAuth token refresh
packages/core/src/runner/runtime/docker-config.tsDockerConfig type, resolveDockerConfig(), parsers for memory, ports, volumes, tmpfs
packages/core/src/runner/runtime/mcp-http-bridge.tsstartMcpHttpBridge() — HTTP server implementing MCP Streamable HTTP transport for Docker
packages/core/src/runner/runtime/docker-sdk-wrapper.jsIn-container wrapper script that runs the Claude Agent SDK and streams JSONL to stdout
packages/core/src/runner/runtime/factory.tsRuntimeFactory — composes ContainerRunner around base runtimes when Docker is enabled
packages/core/src/runner/runtime/interface.tsRuntimeInterface and RuntimeExecuteOptions types
packages/core/src/runner/types.tsInjectedMcpServerDef, InjectedMcpToolDef, McpToolCallResult types
packages/core/src/config/schema.tsAgentDockerSchema, FleetDockerSchema — Zod validation schemas
packages/core/src/state/session-validation.tsisTokenExpiredError() — detects OAuth token expiry errors for retry