Skip to content

System Architecture Overview

This page is the entry point for the herdctl architecture section. It describes the overall shape of the system: what the major subsystems are, how they relate to each other, and how data flows at runtime. Start here before diving into any individual subsystem.

The following diagram shows how @herdctl/core sits at the center of the system, with FleetManager orchestrating all internal modules and exposing a unified API to every interaction layer.

Core architecture diagram showing FleetManager as central orchestrator connecting ConfigLoader, Scheduler, StateManager, Runner, JobManager, Web, and Chat

herdctl is organized as a monorepo with six npm packages. Dependencies flow strictly in one direction: interaction-layer packages depend on core, never the reverse.

Package dependency graph showing relationships between @herdctl/core, @herdctl/chat, @herdctl/discord, @herdctl/slack, @herdctl/web, and herdctl CLI

Packagenpm NameRole
core@herdctl/coreAll business logic: configuration, state, runner, scheduler, work sources, FleetManager orchestration. Every other package depends on this.
chat@herdctl/chatShared chat infrastructure: session management, streaming response buffering, message extraction, error handling. Used by Discord and Slack connectors.
discord@herdctl/discordDiscord connector: discord.js client, slash commands, mention handling, per-agent bot model. Depends on core and chat.
slack@herdctl/slackSlack connector: Bolt App, Socket Mode, prefix commands, channel-to-agent routing, single-app model. Depends on core and chat.
web@herdctl/webWeb dashboard and HTTP API: Vite + React frontend, Fastify backend, real-time event streaming. Depends on core.
cliherdctlCommand-line interface: thin wrapper over FleetManager. Depends on core.

Core never imports from any interaction-layer package. Instead, FleetManager discovers optional packages (@herdctl/discord, @herdctl/slack, @herdctl/web) at runtime via dynamic imports when the fleet configuration references them. This avoids hard dependencies on optional packages that a user may not have installed.

All business logic lives in @herdctl/core. The CLI, web dashboard, HTTP API, and chat connectors are thin wrappers that delegate to FleetManager. A developer can use @herdctl/core directly as a library without any interaction layer:

import { FleetManager } from "@herdctl/core";
const fleet = new FleetManager({
configPath: "./agents",
stateDir: "./.herdctl",
});
await fleet.initialize();
await fleet.start();
// Subscribe to events
fleet.on("job:completed", (payload) => {
console.log(`Job ${payload.job.id} finished in ${payload.durationSeconds}s`);
});
// Query state
const status = fleet.getFleetStatus();
// Trigger manually
await fleet.trigger("my-agent");
// Shut down
await fleet.stop({ waitForJobs: true, timeout: 30000 });

Each interaction layer adds only what is specific to its medium:

  • CLI adds argument parsing, output formatting, and exit code handling.
  • Web adds HTTP routing, SSE/WebSocket transport, and React rendering.
  • Discord adds discord.js event handling, slash commands, and rich embeds.
  • Slack adds Bolt event handling, Socket Mode, and mrkdwn conversion.

None of these packages contain business logic. They translate between their medium and FleetManager’s API.

A fleet runs in a single Node.js process. FleetManager manages the scheduler loop, configuration, and state within that process. Agents are executed as child processes (via the Claude SDK or CLI) or as Docker containers, but FleetManager itself is not distributed.

FleetManager uses composition rather than inheritance. Internal functionality is split into focused module classes that each receive a shared FleetManagerContext interface. This keeps individual files small and testable while presenting a single public API surface. See Module Composition Pattern below for details.

FleetManager is the single orchestration point for the entire system. All interaction layers go through FleetManager rather than calling lower-level modules directly.

ResponsibilityDescription
Lifecycle managementinitialize(), start(), stop(), reload() — full fleet lifecycle
Configuration loadingDelegates to ConfigLoader to discover, parse, validate, and resolve herdctl.yaml
State directory setupDelegates to StateManager to create and manage the .herdctl/ directory
Scheduler creationCreates and controls the Scheduler instance with trigger callbacks routed back into FleetManager
Job executionCreates jobs via JobManager, hands execution to the Runner, streams output, reports completion
Event emissionTyped events for all lifecycle transitions, consumed by the web dashboard, CLI, and library consumers
Chat manager loadingDynamically imports and initializes Discord, Slack, and Web managers when configured
Schedule controlEnable/disable schedules at runtime without editing config files
Hot config reloadReload configuration without restarting; running jobs continue with their original config
ComponentSourceResponsibility
ConfigLoaderconfig/Discovers herdctl.yaml, parses YAML, validates with Zod schemas, resolves fleet composition (sub-fleets), merges defaults, interpolates environment variables. See Configuration System.
Schedulerscheduler/Polling loop that checks agent schedules (interval and cron). Evaluates trigger conditions, respects concurrency limits, fires callbacks into FleetManager. See Schedule System.
StateManagerstate/Manages the .herdctl/ directory: fleet state (state.yaml), job metadata (YAML), streaming output (JSONL), and session info (JSON). All writes are atomic. See State Persistence.
Runnerrunner/Executes agents. The SDK adapter transforms agent config into Claude SDK options. Supports SDK runtime, CLI runtime, and Docker runtime. Streams messages in real time. See Agent Execution Engine.
JobManagerfleet-manager/job-manager.tsManages job lifecycle: creation, queuing, priority, retention, and output streaming. Works with the Runner to track active jobs. See Job Lifecycle.
Chat Managers@herdctl/web, @herdctl/discord, @herdctl/slackDynamically loaded by FleetManager when configured. The web manager serves the dashboard and API; chat managers bridge Discord and Slack to agents. See Chat Infrastructure, Discord, Slack, Web Dashboard.

FleetManager uses a module composition pattern. Each area of functionality is extracted into a focused class that receives a FleetManagerContext interface at construction time:

Module ClassFileResponsibility
StatusQueriesstatus-queries.tsFleet status, agent info, schedule info queries
ScheduleManagementschedule-management.tsEnable/disable schedules, query schedule state
ScheduleExecutorschedule-executor.tsExecute triggered schedules via the Runner
JobControljob-control.tsTrigger, cancel, and fork jobs
LogStreaminglog-streaming.tsAsync iterable log streams for fleet, job, and agent output
ConfigReloadconfig-reload.tsHot-reload configuration, compute config diffs

Each module class is instantiated once in the FleetManager constructor. Rather than passing individual dependencies to every method call, modules access shared state through the FleetManagerContext interface:

export interface FleetManagerContext {
getConfig(): ResolvedConfig | null;
getStateDir(): string;
getStateDirInfo(): StateDirectory | null;
getLogger(): FleetManagerLogger;
getScheduler(): Scheduler | null;
getStatus(): FleetManagerStatus;
getInitializedAt(): string | null;
getStartedAt(): string | null;
getStoppedAt(): string | null;
getLastError(): string | null;
getCheckInterval(): number;
emit(event: string, ...args: unknown[]): boolean;
getEmitter(): EventEmitter;
getChatManager?(platform: string): IChatManager | undefined;
getChatManagers?(): Map<string, IChatManager>;
trigger(agentName: string, scheduleName?: string, options?: TriggerOptions): Promise<TriggerResult>;
}

FleetManager implements this interface and passes itself (this) to each module. The modules call context getters to access current state, and call emit() to fire events. This pattern provides one-way dependency flow (modules depend on context, never on each other) and keeps each module independently testable with a mock context.

FleetManager (extends EventEmitter, implements FleetManagerContext)
├── StatusQueries(this) → getFleetStatus(), getAgentInfo(), ...
├── ScheduleManagement(this) → getSchedules(), enableSchedule(), ...
├── ScheduleExecutor(this) → executeSchedule()
├── JobControl(this) → trigger(), cancelJob(), forkJob()
├── LogStreaming(this) → streamLogs(), streamJobOutput(), ...
└── ConfigReload(this) → reload()

FleetManager’s public API delegates directly to these modules. For example, fleet.getFleetStatus() calls this.statusQueries.getFleetStatus(). The public interface remains a single class with a clean API; the implementation is spread across focused files.

FleetManager extends Node.js EventEmitter and provides strongly-typed events for every lifecycle transition. The event system is what powers real-time updates in the web dashboard, live log streaming in the CLI, and monitoring in library consumers.

Lifecycle events:

EventPayloadWhen Emitted
initialized(none)Config loaded, state directory ready
started(none)Scheduler running, schedules being monitored
stopped(none)All jobs completed or timed out, scheduler stopped
errorErrorUnhandled error not tied to a specific job

Configuration events:

EventPayloadWhen Emitted
config:reloaded{ agentCount, agentNames, configPath, changes[], timestamp }Configuration hot-reloaded from disk

Agent events:

EventPayloadWhen Emitted
agent:started{ agent, timestamp }Agent registered with the fleet
agent:stopped{ agentName, timestamp, reason? }Agent unregistered from the fleet

Schedule events:

EventPayloadWhen Emitted
schedule:triggered{ agentName, scheduleName, schedule, timestamp }Schedule fires, before job creation
schedule:skipped{ agentName, scheduleName, reason, timestamp }Schedule check skipped (already running, disabled, concurrency limit, empty work source)

Job events:

EventPayloadWhen Emitted
job:created{ job, agentName, scheduleName?, timestamp }New job created (status: pending)
job:output{ jobId, agentName, output, outputType, timestamp }Job produces output during execution
job:completed{ job, agentName, exitReason, durationSeconds, timestamp }Job finishes successfully
job:failed{ job, agentName, error, exitReason, durationSeconds?, timestamp }Job fails with error
job:cancelled{ job, agentName, terminationType, durationSeconds?, timestamp }Job cancelled (graceful, forced, or already stopped)
job:forked{ job, originalJob, agentName, timestamp }New job created by forking an existing job’s session
  • Web dashboard: Subscribes to all events via SSE/WebSocket to update the UI in real time.
  • CLI: Subscribes to job:output for live log streaming during herdctl logs --follow.
  • Library consumers: Subscribe to any event for custom monitoring, alerting, or integration.
  • Chat managers: Subscribe to job events to stream agent responses back to Discord/Slack channels.

When a consumer calls initialize() followed by start(), the following sequence occurs:

  1. Load configuration — ConfigLoader discovers and parses herdctl.yaml, loads all agent configs, resolves fleet composition (sub-fleets), merges defaults, and returns a ResolvedConfig.
  2. Initialize state directory — StateManager creates the .herdctl/ directory structure if it does not exist, including subdirectories for jobs and sessions.
  3. Create Scheduler — A new Scheduler instance is created with agent definitions and a trigger callback that routes back into FleetManager.
  4. Load chat managers — FleetManager dynamically imports @herdctl/discord, @herdctl/slack, and @herdctl/web if agents or fleet config reference them. This avoids hard dependencies on optional packages.
  5. Start scheduler — On start(), the Scheduler begins its polling loop, checking all agent schedules on each tick.

The FleetManager tracks its own lifecycle through a state machine: uninitialized -> initialized -> running -> stopping -> stopped. Operations that require a specific state (e.g., trigger() requires running) throw InvalidStateError if called at the wrong time.

Once started, the system follows a continuous loop:

  1. Scheduler polls — On each tick (default: every 1 second), the Scheduler checks every agent’s schedules against their state (last run time, concurrency limits, enabled/disabled status).
  2. Schedule fires — When a schedule is due, the Scheduler calls FleetManager’s trigger callback with agent and schedule information. FleetManager emits schedule:triggered.
  3. Job created — FleetManager creates a job record via JobManager and StateManager. The job starts in pending status. FleetManager emits job:created.
  4. Runner executes — The ScheduleExecutor hands the job to the Runner. The Runner transforms agent config via the SDK adapter, selects a runtime (SDK, CLI, or Docker), invokes the agent, and streams messages back as they arrive.
  5. Output persisted — Each message is appended to the job’s JSONL file and emitted as a job:output event. Consumers (web dashboard, CLI, library code) receive output in real time.
  6. Job completes — The Runner reports completion or failure. StateManager updates state.yaml with the agent’s last run time and next trigger time. FleetManager emits job:completed or job:failed.

This loop continues until stop() is called, at which point the Scheduler stops polling, running jobs optionally drain (with a configurable timeout), and the stopped event is emitted.

Agents in herdctl are defined declaratively in YAML and resolved into ResolvedAgent objects at configuration time. Each agent specifies its system prompt, schedules, permissions, MCP servers, and optional chat configuration.

Agent composition diagram showing how agent YAML config resolves into a ResolvedAgent with schedules, permissions, MCP servers, and chat configuration

For details on agent configuration, see Configuration System. For details on how agents are executed, see Agent Execution Engine.

FleetManager defines a typed error hierarchy rooted at FleetManagerError. All errors include a code string for programmatic discrimination and contextual information for debugging:

Error ClassCodeWhen Thrown
ConfigurationErrorCONFIGURATION_ERRORConfig loading or validation failed
AgentNotFoundErrorAGENT_NOT_FOUNDReferenced agent does not exist in config
JobNotFoundErrorJOB_NOT_FOUNDReferenced job ID does not exist
ScheduleNotFoundErrorSCHEDULE_NOT_FOUNDReferenced schedule does not exist on the agent
InvalidStateErrorINVALID_STATEOperation called in wrong lifecycle state
ConcurrencyLimitErrorCONCURRENCY_LIMITAgent or fleet-wide concurrency limit reached

All errors extend FleetManagerError, which itself extends Error. TypeScript consumers can use instanceof checks or the code property for error discrimination.