Skip to content

State Management

herdctl uses a file-based state system to track agents, jobs, and sessions. All state is stored in the .herdctl/ directory—no database required.

.herdctl/
├── state.yaml # Fleet state (agent status, schedules)
├── jobs/
│ ├── job-2024-01-19-abc123.yaml # Job metadata
│ └── job-2024-01-19-abc123.jsonl # Streaming output log
├── sessions/
│ └── <agent-name>.json # Session info per agent
└── logs/
└── <agent-name>.log # Agent-level logs
DirectoryPurpose
state.yamlFleet-wide state including agent status and scheduling
jobs/Individual job metadata (YAML) and streaming output (JSONL)
sessions/Claude session information for resume/fork capability
logs/Agent-level logs for debugging

The state.yaml file tracks fleet-wide state using the FleetStateSchema:

.herdctl/state.yaml
fleet:
started_at: "2025-01-19T10:00:00Z" # ISO timestamp when fleet started
agents:
bragdoc-coder:
status: idle # idle | running | error
current_job: null # Job ID if currently running
last_job: job-2024-01-19-abc123
next_schedule: issue-check
next_trigger_at: "2025-01-19T10:10:00Z"
container_id: null # Docker container ID (if using Docker)
error_message: null # Error message if status is 'error'
bragdoc-marketer:
status: running
current_job: job-2024-01-19-def456
last_job: job-2024-01-19-xyz789
next_schedule: null
next_trigger_at: null
container_id: "def456"
FieldTypeDescription
statusidle | running | errorCurrent agent status
current_jobstring?ID of currently running job
last_jobstring?ID of the last completed job
next_schedulestring?Name of the next scheduled trigger
next_trigger_atstring?ISO timestamp of next scheduled run
container_idstring?Docker container ID (if containerized)
error_messagestring?Error message when status is error

Each job creates two files in .herdctl/jobs/:

.herdctl/jobs/job-2024-01-19-abc123.yaml
id: job-2024-01-19-abc123
agent: bragdoc-marketer
schedule: daily-analytics
trigger_type: schedule # manual | schedule | webhook | chat | fork
status: completed # pending | running | completed | failed | cancelled
exit_reason: success # success | error | timeout | cancelled | max_turns
session_id: claude-session-xyz789
forked_from: null # Parent job ID if this was forked
started_at: "2024-01-19T09:00:00Z"
finished_at: "2024-01-19T09:05:23Z"
duration_seconds: 323
prompt: |
Analyze site traffic for the past 24 hours.
Create a brief report and post to #marketing channel.
summary: "Generated daily analytics report. Traffic up 12% from yesterday."
output_file: job-2024-01-19-abc123.jsonl
FieldTypeDescription
idstringFormat: job-YYYY-MM-DD-<random6>
agentstringName of the executing agent
schedulestring?Schedule name that triggered the job
trigger_typeenumHow the job was started
statusenumCurrent job status
exit_reasonenumWhy the job ended (when finished)
session_idstring?Claude session ID for resume/fork
forked_fromstring?Parent job ID (for forked jobs)
started_atstringISO timestamp when job started
finished_atstring?ISO timestamp when job finished
duration_secondsnumber?Total execution time
promptstring?The prompt given to the agent
summarystring?Brief summary of job results
output_filestring?Path to the JSONL output file

Job output is stored as newline-delimited JSON (JSONL) for efficient streaming:

{"type":"system","subtype":"init","timestamp":"2024-01-19T09:00:00Z"}
{"type":"assistant","content":"I'll analyze the traffic data...","timestamp":"2024-01-19T09:00:01Z"}
{"type":"tool_use","tool_name":"Bash","input":"node scripts/get-analytics.js","timestamp":"2024-01-19T09:00:02Z"}
{"type":"tool_result","result":"...analytics output...","success":true,"timestamp":"2024-01-19T09:00:05Z"}
{"type":"assistant","content":"Traffic is up 12% from yesterday...","timestamp":"2024-01-19T09:00:10Z"}

All messages include a type field and timestamp. The five message types are:

System events like session initialization:

{
"type": "system",
"subtype": "init",
"content": "Session initialized",
"timestamp": "2024-01-19T09:00:00Z"
}

Claude’s text responses:

{
"type": "assistant",
"content": "I'll analyze the traffic data...",
"partial": false,
"usage": {
"input_tokens": 1500,
"output_tokens": 200
},
"timestamp": "2024-01-19T09:00:01Z"
}

Tool invocations by Claude:

{
"type": "tool_use",
"tool_name": "Bash",
"tool_use_id": "toolu_abc123",
"input": "gh issue list --label ready --json number,title",
"timestamp": "2024-01-19T09:00:02Z"
}

Results from tool execution:

{
"type": "tool_result",
"tool_use_id": "toolu_abc123",
"result": "[{\"number\":42,\"title\":\"Fix auth timeout\"}]",
"success": true,
"error": null,
"timestamp": "2024-01-19T09:00:05Z"
}

Error messages:

{
"type": "error",
"message": "Tool execution failed",
"code": "TOOL_ERROR",
"stack": "...",
"timestamp": "2024-01-19T09:00:05Z"
}

Session files track Claude session state for resume/fork capability:

.herdctl/sessions/bragdoc-coder.json
{
"agent_name": "bragdoc-coder",
"session_id": "claude-session-xyz789",
"created_at": "2024-01-19T08:00:00Z",
"last_used_at": "2024-01-19T10:05:00Z",
"job_count": 15,
"mode": "autonomous"
}
ModeDescription
autonomousAgent runs independently
interactiveHuman-in-the-loop mode
reviewReview/approval required for actions

All state file operations use atomic writes to prevent corruption:

  1. Write to temp file: Content is written to .<filename>.tmp.<random> in the same directory
  2. Atomic rename: The temp file is renamed to the target (atomic on POSIX systems)
  3. Cleanup on failure: Temp files are cleaned up if the write fails

This pattern ensures:

  • No partial writes: Files are never in an incomplete state
  • Crash safety: If the process crashes mid-write, the original file remains intact
  • Concurrent safety: Multiple readers won’t see incomplete data

JSONL files use fs.appendFile, which is atomic at the message level on most systems. Each line is a complete, self-contained JSON object.

On Windows, the rename operation includes retry logic with exponential backoff to handle file locking (EACCES/EPERM errors).

Terminal window
# View fleet state
cat .herdctl/state.yaml
# View specific job
cat .herdctl/jobs/job-2024-01-19-abc123.yaml
# View job output
cat .herdctl/jobs/job-2024-01-19-abc123.jsonl
# View session info
cat .herdctl/sessions/bragdoc-coder.json
SymptomLikely CauseSolution
Agent stuck in runningCrash during jobReset agent status in state.yaml
Missing job outputWrite failureCheck disk space, permissions
Session won’t resumeInvalid session IDClear session file
  • Sessions - Understanding session persistence
  • Jobs - Job lifecycle and management
  • Workspaces - Where agents operate