Job System
A job is a single execution of an agent. Every time an agent runs — whether triggered by a schedule, a chat message, a manual CLI command, or a fork of a previous job — herdctl creates a job to track the execution from start to finish. The job system provides the metadata schema, status lifecycle, output file management, and event infrastructure that the rest of herdctl depends on.
This page consolidates all job-specific architecture into one reference. For details on how jobs are executed, see the Runner. For how job files are stored on disk, see State Management.
What Is a Job
Section titled “What Is a Job”A job represents one invocation of an agent with a prompt. It captures:
- Identity — which agent ran, which schedule (if any) triggered it, and how it was triggered.
- Lifecycle — when it started, its current status, when it finished, and why it ended.
- Context — the prompt given to the agent, the Claude session ID for resume/fork, and a reference to the parent job if forked.
- Results — a summary of what the agent accomplished and a path to the full streaming output log.
Jobs are the primary unit of work in herdctl. The Scheduler creates jobs when schedules fire. The Runner executes them. The State Manager persists them. The Web Dashboard and CLI display them.
JobManager Module
Section titled “JobManager Module”Job operations are split across two modules in @herdctl/core:
| Module | Location | Responsibility |
|---|---|---|
| Job metadata | packages/core/src/state/job-metadata.ts | CRUD operations for job metadata files (create, read, update, delete, list with filters). |
| Job output | packages/core/src/state/job-output.ts | Append and read operations for JSONL streaming output files. |
| Job control | packages/core/src/fleet-manager/job-control.ts | Higher-level orchestration: trigger, cancel, and fork operations coordinated through FleetManager. |
The JobControl class in job-control.ts is the primary interface for job operations at the fleet level. It handles concurrency checks, session lookup, runtime selection, event emission, and hook execution around the core CRUD operations.
Job ID Format
Section titled “Job ID Format”Every job receives a unique identifier with the format:
job-YYYY-MM-DD-<random6>For example: job-2024-01-19-abc123
The date portion uses the job creation date. The suffix is 6 lowercase alphanumeric characters generated from Math.random().toString(36). The format is enforced by a regex in JobMetadataSchema:
/^job-\d{4}-\d{2}-\d{2}-[a-z0-9]{6}$/The generateJobId() function accepts optional Date and random function parameters to enable deterministic testing.
Job Metadata Schema
Section titled “Job Metadata Schema”Each job is persisted as a YAML file at .herdctl/jobs/<job-id>.yaml. The schema is defined by JobMetadataSchema (a Zod schema in packages/core/src/state/schemas/job-metadata.ts).
id: job-2024-01-19-abc123agent: bragdoc-marketerschedule: daily-analyticstrigger_type: schedule
status: completedexit_reason: success
session_id: claude-session-xyz789forked_from: null
started_at: "2024-01-19T09:00:00Z"finished_at: "2024-01-19T09:05:23Z"duration_seconds: 323
prompt: | Analyze site traffic for the past 24 hours. Create a brief report and post to #marketing channel.
summary: "Generated daily analytics report. Traffic up 12% from yesterday."output_file: job-2024-01-19-abc123.jsonlField Reference
Section titled “Field Reference”| Field | Type | Description |
|---|---|---|
id | string | Unique identifier in job-YYYY-MM-DD-<random6> format |
agent | string | Qualified name of the executing agent |
schedule | string? | Schedule name that triggered the job (null for manual/chat triggers) |
trigger_type | enum | How the job was started (see Trigger Types) |
status | enum | Current job status (see Status Lifecycle) |
exit_reason | enum | Why the job ended (see Exit Reasons), null while running |
session_id | string? | Claude session ID for resume/fork capability |
forked_from | string? | Parent job ID when trigger_type is fork |
started_at | string | ISO timestamp when job was created |
finished_at | string? | ISO timestamp when job finished (null while running) |
duration_seconds | number? | Total execution time in seconds (auto-calculated when finished_at is set) |
prompt | string? | The prompt given to the agent |
summary | string? | Brief summary of job results (extracted from the final assistant message) |
output_file | string? | Filename of the JSONL output log |
Immutable Fields
Section titled “Immutable Fields”When updating a job via updateJob(), the following fields cannot be changed: id, agent, trigger_type, and started_at. These are set at creation time and remain fixed for the job’s lifetime.
Automatic Duration Calculation
Section titled “Automatic Duration Calculation”When finished_at is set during an update and duration_seconds is not explicitly provided, updateJob() automatically calculates the duration from started_at and finished_at.
Trigger Types
Section titled “Trigger Types”The trigger_type field records how a job was initiated:
| Trigger Type | Description |
|---|---|
manual | Started via CLI command or programmatic API call |
schedule | Triggered by the Scheduler when a schedule fires |
webhook | Triggered by an external HTTP webhook |
chat | Triggered by a generic chat message |
discord | Triggered by a Discord message |
slack | Triggered by a Slack message |
web | Triggered from the web dashboard |
fork | Forked from a previous job’s session |
The trigger type is set at job creation time and is immutable. Platform-specific connectors (Discord, Slack) set the appropriate trigger type so that job history can be filtered by source.
Status Lifecycle
Section titled “Status Lifecycle”A job progresses through a linear status lifecycle:
Status Values
Section titled “Status Values”| Status | Description |
|---|---|
pending | Job record created but execution has not begun |
running | Agent is actively executing the job |
completed | Job finished successfully |
failed | Job ended with an error |
cancelled | Job was cancelled by user or system shutdown |
Transitions
Section titled “Transitions”The job starts in pending when createJob() writes the metadata file. The Runner transitions it to running when the SDK execution loop begins. On completion, the Runner sets the final status based on the outcome:
- Success: status becomes
completed, exit_reason is set tosuccess. - Error: status becomes
failed, exit_reason indicates the failure type. - Cancellation: status becomes
cancelled, exit_reason iscancelled.
A job in a terminal state (completed, failed, cancelled) is never updated again. There is no mechanism to restart a finished job — instead, fork it to create a new job with the same context.
For details on how the Runner executes these transitions and manages the SDK session, see Agent Execution Engine.
Relationship to Agent State
Section titled “Relationship to Agent State”When a job starts running, the agent’s entry in state.yaml is updated: status becomes running and current_job is set to the job ID. When the job finishes, the agent returns to idle (or error if the job failed), current_job is cleared, and last_job is set to the completed job’s ID.
Exit Reasons
Section titled “Exit Reasons”The exit_reason field captures why a job ended. It is set when the job transitions to a terminal status.
| Exit Reason | Description |
|---|---|
success | Job completed naturally — the agent finished its work |
error | Job failed due to an unrecoverable error (SDK error, API failure, etc.) |
timeout | Job exceeded its configured time limit or encountered a network timeout |
cancelled | Job was explicitly cancelled by user intervention or system shutdown |
max_turns | Job reached the maximum number of conversation turns configured for the agent |
The Runner’s classifyError() function examines error messages and codes to determine the appropriate exit reason. For example, ETIMEDOUT errors map to the timeout exit reason, while AbortController signals map to cancelled.
Job Forking
Section titled “Job Forking”Forking creates a new job based on an existing one. The new job inherits the original’s agent and schedule but receives a fresh job ID and starts in pending status. The forked_from field links back to the parent job.
Fork Mechanics
Section titled “Fork Mechanics”The JobControl.forkJob() method:
- Reads the original job’s metadata.
- Validates that the agent still exists in the current configuration.
- Creates a new job with
trigger_type: "fork"andforked_fromset to the original job ID. - Allows optional modifications: a different prompt or schedule name.
- Emits
job:createdandjob:forkedevents.
const result = await manager.forkJob("job-2024-01-19-abc123", { prompt: "Try a different approach to the same task",});// result.jobId → "job-2024-01-19-def456"// result.forkedFromJobId → "job-2024-01-19-abc123"Fork vs Resume
Section titled “Fork vs Resume”Forking and resuming both build on previous sessions but serve different purposes:
| Operation | Session behavior | Use case |
|---|---|---|
| Resume | Continues the exact same session | Pick up where the previous job left off |
| Fork | Branches from the session to explore alternatives | Try a different approach without losing the original |
For details on session management, see Runner: Session Management.
Output File Management
Section titled “Output File Management”Each job produces a JSONL output file alongside its metadata file. The two files share the same base name:
.herdctl/jobs/├── job-2024-01-19-abc123.yaml # Metadata├── job-2024-01-19-abc123.jsonl # Streaming output├── job-2024-01-19-def456.yaml└── job-2024-01-19-def456.jsonlWriting Output
Section titled “Writing Output”The Runner writes to the JSONL file in real time during execution via appendJobOutput(). Each call appends a single JSON line with an automatically-added timestamp. There is no buffering — messages appear on disk immediately, enabling concurrent readers.
await appendJobOutput(jobsDir, jobId, { type: "assistant", content: "Analyzing the codebase...",});Batch writing is also supported via appendJobOutputBatch(), which validates all messages before writing any of them.
Output Message Types
Section titled “Output Message Types”The JSONL file contains five message types, defined by a Zod discriminated union (JobOutputMessageSchema):
| Type | Description | Key Fields |
|---|---|---|
system | Session lifecycle events (init, end, complete) | subtype, content |
assistant | Claude’s text responses | content, partial, usage |
tool_use | Tool invocations by the agent | tool_name, tool_use_id, input |
tool_result | Results from tool execution | tool_use_id, result, success, error |
error | Error events | message, code, stack |
Every message includes a type field and a timestamp. The isValidJobOutputInput() type guard validates that a message has the minimum required fields before writing.
For complete field-level details on each message type, see State Management: JSONL Message Types.
JSONL Properties
Section titled “JSONL Properties”The JSONL format provides several important properties:
- Streamable — each line is self-contained, so the file can be read while still being written to.
- Append-only — messages are never inserted, overwritten, or reordered.
- Replayable — the full conversation can be reconstructed by reading from top to bottom.
- No buffering —
fs.appendFilewrites each message immediately.
For details on write atomicity and concurrent safety, see State Management: JSONL Append Semantics.
Reading Job Output
Section titled “Reading Job Output”The job output module provides two APIs for reading output: a streaming async generator and a convenience function that collects all messages into an array.
Async Generator (Streaming)
Section titled “Async Generator (Streaming)”The readJobOutput() function returns an AsyncGenerator that yields messages one at a time. This is the memory-efficient choice for large output files:
import { readJobOutput } from "@herdctl/core";
for await (const message of readJobOutput(jobsDir, jobId)) { switch (message.type) { case "assistant": console.log(`[Claude] ${message.content}`); break; case "tool_use": console.log(`[Tool] ${message.tool_name}: ${message.input}`); break; case "error": console.error(`[Error] ${message.message}`); break; }}The generator handles several edge cases:
- File not found — yields nothing (does not throw).
- Empty lines — silently skipped.
- Invalid lines — throws by default, or skips with
skipInvalidLines: true. - Resource cleanup — the readline interface and file stream are closed in a
finallyblock.
Collecting All Messages
Section titled “Collecting All Messages”The readJobOutputAll() convenience function collects every message into an array:
import { readJobOutputAll } from "@herdctl/core";
const messages = await readJobOutputAll(jobsDir, jobId);console.log(`Total messages: ${messages.length}`);For large output files, prefer the streaming generator to avoid loading the entire file into memory.
Tailing Output (tail -f)
Section titled “Tailing Output (tail -f)”Because JSONL files are append-only with immediate writes, standard Unix tools work for real-time monitoring:
tail -f .herdctl/jobs/job-2024-01-19-abc123.jsonl | jq .This is useful for debugging from the terminal without involving herdctl’s programmatic APIs.
Extracting Final Output
Section titled “Extracting Final Output”The JobControl.getJobFinalOutput() method reads a completed job’s JSONL file and extracts the last meaningful content. It prioritizes assistant text messages over raw tool results, since the agent’s response is typically what callers care about. This extracted output is used by the hook system to populate the result.output field in HookContext.
Job Retention and Cleanup
Section titled “Job Retention and Cleanup”Job metadata and output files persist indefinitely by default. There is currently no automatic retention policy or cleanup mechanism built into herdctl. Both the .yaml metadata file and .jsonl output file remain in .herdctl/jobs/ until manually removed.
Manual Cleanup
Section titled “Manual Cleanup”The deleteJob() function removes a job’s metadata file:
import { deleteJob } from "@herdctl/core";
const deleted = await deleteJob(jobsDir, "job-2024-01-19-abc123");This deletes only the .yaml file. The corresponding .jsonl output file must be removed separately if desired.
Listing and Filtering Jobs
Section titled “Listing and Filtering Jobs”The listJobs() function supports filtering to help identify jobs for cleanup:
import { listJobs } from "@herdctl/core";
// Find all failed jobs older than 7 daysconst sevenDaysAgo = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000);const { jobs, errors } = await listJobs(jobsDir, { status: "failed", startedBefore: sevenDaysAgo,});Available filters:
| Filter | Type | Description |
|---|---|---|
agent | string | Filter by agent name |
status | JobStatus | Filter by job status |
startedAfter | string or Date | Jobs started on or after this date |
startedBefore | string or Date | Jobs started on or before this date |
Results are sorted by started_at in descending order (most recent first). The errors count indicates how many job files failed to parse (corrupted or incompatible format).
Job Events
Section titled “Job Events”The FleetManager emits strongly-typed events throughout a job’s lifecycle. These events drive the web dashboard, CLI output, and any programmatic subscribers.
Event Timeline
Section titled “Event Timeline”A typical job lifecycle emits events in this order:
job:created ──► Job record written, status is "pending" (execution begins)job:output ──► Repeated for each output chunk during executionjob:completed ──► Job finished successfully orjob:failed ──► Job ended with an error orjob:cancelled ──► Job was cancelledFork operations emit an additional event:
job:created ──► New forked job record writtenjob:forked ──► Links the new job to its parentEvent Payloads
Section titled “Event Payloads”| Event | Payload | Key Fields |
|---|---|---|
job:created | JobCreatedPayload | job (full metadata), agentName, scheduleName |
job:output | JobOutputPayload | jobId, agentName, output (content chunk), outputType |
job:completed | JobCompletedPayload | job, agentName, exitReason, durationSeconds |
job:failed | JobFailedPayload | job, agentName, error, exitReason |
job:cancelled | JobCancelledPayload | job, agentName, terminationType, durationSeconds |
job:forked | JobForkedPayload | job (new job), originalJob, agentName |
Subscribing to Events
Section titled “Subscribing to Events”const manager = new FleetManager({ stateDir: ".herdctl" });
manager.on("job:created", (payload) => { console.log(`Job ${payload.job.id} created for ${payload.agentName}`);});
manager.on("job:completed", (payload) => { console.log(`Job ${payload.job.id} completed in ${payload.durationSeconds}s`);});
manager.on("job:failed", (payload) => { console.error(`Job ${payload.job.id} failed: ${payload.error.message}`);});All event payloads include a timestamp field (ISO string) indicating when the event occurred. The event map is defined with full TypeScript generics in FleetManagerEventMap, providing type-safe on() and emit() calls.
Job Cancellation
Section titled “Job Cancellation”Jobs can be cancelled through the JobControl.cancelJob() method. Cancellation updates the job’s status to cancelled with exit reason cancelled.
Cancellation Flow
Section titled “Cancellation Flow”- The method verifies the job exists and is in a
runningorpendingstate. - If the job is already in a terminal state, it returns
terminationType: "already_stopped"without modification. - For running jobs, the metadata is updated to
status: "cancelled"andexit_reason: "cancelled". - A
job:cancelledevent is emitted with the termination type and duration.
Bulk Cancellation
Section titled “Bulk Cancellation”During fleet shutdown, cancelRunningJobs() cancels all currently running jobs in parallel. Each agent’s currentJobId is collected and cancelled concurrently. Individual cancellation failures are logged as warnings but do not block shutdown of other jobs.
Source Code Layout
Section titled “Source Code Layout”The job system spans several files in packages/core/src/:
packages/core/src/├── state/│ ├── schemas/│ │ ├── job-metadata.ts # JobMetadataSchema, generateJobId(), createJobMetadata()│ │ └── job-output.ts # JobOutputMessageSchema, JobOutputInput types│ ├── job-metadata.ts # createJob(), updateJob(), getJob(), listJobs(), deleteJob()│ └── job-output.ts # appendJobOutput(), readJobOutput(), readJobOutputAll()└── fleet-manager/ ├── job-control.ts # JobControl class: trigger(), cancelJob(), forkJob() └── event-types.ts # Job event payload type definitionsRelated Pages
Section titled “Related Pages”Architecture
Section titled “Architecture”- System Architecture Overview — Job system’s role in the overall architecture
- Agent Execution Engine — How the Runner executes jobs and streams output
- State Persistence — How job files are stored on disk
- Schedule System — How schedules create jobs
- Chat Infrastructure — How chat messages trigger jobs