State Management
herdctl uses a file-based state system to track agents, jobs, and sessions. All state is stored in the .herdctl/ directory—no database required.
Directory Structure
Section titled “Directory Structure”.herdctl/├── state.yaml # Fleet state (agent status, schedules)├── jobs/│ ├── job-2024-01-19-abc123.yaml # Job metadata│ └── job-2024-01-19-abc123.jsonl # Streaming output log├── sessions/│ └── <agent-name>.json # Session info per agent└── logs/ └── <agent-name>.log # Agent-level logsSubdirectory Purposes
Section titled “Subdirectory Purposes”| Directory | Purpose |
|---|---|
state.yaml | Fleet-wide state including agent status and scheduling |
jobs/ | Individual job metadata (YAML) and streaming output (JSONL) |
sessions/ | Claude session information for resume/fork capability |
logs/ | Agent-level logs for debugging |
state.yaml Format
Section titled “state.yaml Format”The state.yaml file tracks fleet-wide state using the FleetStateSchema:
fleet: started_at: "2025-01-19T10:00:00Z" # ISO timestamp when fleet started
agents: bragdoc-coder: status: idle # idle | running | error current_job: null # Job ID if currently running last_job: job-2024-01-19-abc123 next_schedule: issue-check next_trigger_at: "2025-01-19T10:10:00Z" container_id: null # Docker container ID (if using Docker) error_message: null # Error message if status is 'error'
bragdoc-marketer: status: running current_job: job-2024-01-19-def456 last_job: job-2024-01-19-xyz789 next_schedule: null next_trigger_at: null container_id: "def456"Agent State Fields
Section titled “Agent State Fields”| Field | Type | Description |
|---|---|---|
status | idle | running | error | Current agent status |
current_job | string? | ID of currently running job |
last_job | string? | ID of the last completed job |
next_schedule | string? | Name of the next scheduled trigger |
next_trigger_at | string? | ISO timestamp of next scheduled run |
container_id | string? | Docker container ID (if containerized) |
error_message | string? | Error message when status is error |
Job File Formats
Section titled “Job File Formats”Each job creates two files in .herdctl/jobs/:
Job Metadata (YAML)
Section titled “Job Metadata (YAML)”id: job-2024-01-19-abc123agent: bragdoc-marketerschedule: daily-analyticstrigger_type: schedule # manual | schedule | webhook | chat | fork
status: completed # pending | running | completed | failed | cancelledexit_reason: success # success | error | timeout | cancelled | max_turns
session_id: claude-session-xyz789forked_from: null # Parent job ID if this was forked
started_at: "2024-01-19T09:00:00Z"finished_at: "2024-01-19T09:05:23Z"duration_seconds: 323
prompt: | Analyze site traffic for the past 24 hours. Create a brief report and post to #marketing channel.
summary: "Generated daily analytics report. Traffic up 12% from yesterday."output_file: job-2024-01-19-abc123.jsonlJob Metadata Fields
Section titled “Job Metadata Fields”| Field | Type | Description |
|---|---|---|
id | string | Format: job-YYYY-MM-DD-<random6> |
agent | string | Name of the executing agent |
schedule | string? | Schedule name that triggered the job |
trigger_type | enum | How the job was started |
status | enum | Current job status |
exit_reason | enum | Why the job ended (when finished) |
session_id | string? | Claude session ID for resume/fork |
forked_from | string? | Parent job ID (for forked jobs) |
started_at | string | ISO timestamp when job started |
finished_at | string? | ISO timestamp when job finished |
duration_seconds | number? | Total execution time |
prompt | string? | The prompt given to the agent |
summary | string? | Brief summary of job results |
output_file | string? | Path to the JSONL output file |
Streaming Output (JSONL)
Section titled “Streaming Output (JSONL)”Job output is stored as newline-delimited JSON (JSONL) for efficient streaming:
{"type":"system","subtype":"init","timestamp":"2024-01-19T09:00:00Z"}{"type":"assistant","content":"I'll analyze the traffic data...","timestamp":"2024-01-19T09:00:01Z"}{"type":"tool_use","tool_name":"Bash","input":"node scripts/get-analytics.js","timestamp":"2024-01-19T09:00:02Z"}{"type":"tool_result","result":"...analytics output...","success":true,"timestamp":"2024-01-19T09:00:05Z"}{"type":"assistant","content":"Traffic is up 12% from yesterday...","timestamp":"2024-01-19T09:00:10Z"}JSONL Message Types
Section titled “JSONL Message Types”All messages include a type field and timestamp. The five message types are:
system
Section titled “system”System events like session initialization:
{ "type": "system", "subtype": "init", "content": "Session initialized", "timestamp": "2024-01-19T09:00:00Z"}assistant
Section titled “assistant”Claude’s text responses:
{ "type": "assistant", "content": "I'll analyze the traffic data...", "partial": false, "usage": { "input_tokens": 1500, "output_tokens": 200 }, "timestamp": "2024-01-19T09:00:01Z"}tool_use
Section titled “tool_use”Tool invocations by Claude:
{ "type": "tool_use", "tool_name": "Bash", "tool_use_id": "toolu_abc123", "input": "gh issue list --label ready --json number,title", "timestamp": "2024-01-19T09:00:02Z"}tool_result
Section titled “tool_result”Results from tool execution:
{ "type": "tool_result", "tool_use_id": "toolu_abc123", "result": "[{\"number\":42,\"title\":\"Fix auth timeout\"}]", "success": true, "error": null, "timestamp": "2024-01-19T09:00:05Z"}Error messages:
{ "type": "error", "message": "Tool execution failed", "code": "TOOL_ERROR", "stack": "...", "timestamp": "2024-01-19T09:00:05Z"}Session Info Format
Section titled “Session Info Format”Session files track Claude session state for resume/fork capability:
{ "agent_name": "bragdoc-coder", "session_id": "claude-session-xyz789", "created_at": "2024-01-19T08:00:00Z", "last_used_at": "2024-01-19T10:05:00Z", "job_count": 15, "mode": "autonomous"}Session Modes
Section titled “Session Modes”| Mode | Description |
|---|---|
autonomous | Agent runs independently |
interactive | Human-in-the-loop mode |
review | Review/approval required for actions |
Atomic Writes for Safety
Section titled “Atomic Writes for Safety”All state file operations use atomic writes to prevent corruption:
- Write to temp file: Content is written to
.<filename>.tmp.<random>in the same directory - Atomic rename: The temp file is renamed to the target (atomic on POSIX systems)
- Cleanup on failure: Temp files are cleaned up if the write fails
This pattern ensures:
- No partial writes: Files are never in an incomplete state
- Crash safety: If the process crashes mid-write, the original file remains intact
- Concurrent safety: Multiple readers won’t see incomplete data
JSONL Appends
Section titled “JSONL Appends”JSONL files use fs.appendFile, which is atomic at the message level on most systems. Each line is a complete, self-contained JSON object.
Windows Compatibility
Section titled “Windows Compatibility”On Windows, the rename operation includes retry logic with exponential backoff to handle file locking (EACCES/EPERM errors).
Debugging State Issues
Section titled “Debugging State Issues”Inspect Current State
Section titled “Inspect Current State”# View fleet statecat .herdctl/state.yaml
# View specific jobcat .herdctl/jobs/job-2024-01-19-abc123.yaml
# View job outputcat .herdctl/jobs/job-2024-01-19-abc123.jsonl
# View session infocat .herdctl/sessions/bragdoc-coder.jsonCommon Issues
Section titled “Common Issues”| Symptom | Likely Cause | Solution |
|---|---|---|
Agent stuck in running | Crash during job | Reset agent status in state.yaml |
| Missing job output | Write failure | Check disk space, permissions |
| Session won’t resume | Invalid session ID | Clear session file |
Related Concepts
Section titled “Related Concepts”- Sessions - Understanding session persistence
- Jobs - Job lifecycle and management
- Workspaces - Where agents operate