Skip to content

Jobs

A Job represents a single execution of an agent. Each time an agent runs—whether triggered by a schedule, manual invocation, or trigger event—herdctl creates a job to track that execution from start to finish.

PropertyTypeDescription
idstringUnique job identifier (UUID)
agentstringName of the agent executing this job
schedulestringSchedule that triggered this job (if scheduled)
statusenumCurrent job status
exitReasonenumWhy the job ended (set on completion)
sessionIdstringClaude session ID for resume capability
startedAttimestampWhen the job started execution
completedAttimestampWhen the job finished (success or failure)
outputstringPath to job output file (JSONL format)
errorstringError message if job failed

Jobs progress through a defined lifecycle:

PENDING → RUNNING → COMPLETED
→ FAILED
→ CANCELLED

The following diagram shows the full journey of a job from trigger to completion, including how the major components interact:

sequenceDiagram
    participant Trigger as Trigger<br/>(Schedule/Manual)
    participant Scheduler
    participant FM as FleetManager
    participant SE as ScheduleExecutor
    participant JE as JobExecutor
    participant RT as Runtime<br/>(SDK/CLI)
    participant State as StateManager

    Trigger->>Scheduler: Schedule is due / manual trigger
    activate Scheduler
    Scheduler->>Scheduler: Check: enabled, capacity, not running
    Note right of Scheduler: Skip if disabled,<br/>at capacity, or<br/>already running
    Scheduler->>FM: onTrigger(TriggerInfo)
    deactivate Scheduler

    activate FM
    FM->>SE: executeSchedule(info)
    activate SE
    SE->>SE: Resolve prompt from schedule or agent default
    SE->>JE: executor.execute(options)
    activate JE

    JE->>State: createJob(agent, trigger, prompt)
    State-->>JE: job record (status: pending)

    JE->>State: updateJob(status: running)

    opt Session resume requested
        JE->>State: getSessionInfo(agent)
        State-->>JE: session (validate expiry + working dir)
    end

    JE->>RT: runtime.execute(prompt, agent, resume?)
    activate RT
    Note over RT: Returns AsyncIterable of messages

    loop Stream SDK messages
        RT-->>JE: SDKMessage (system, assistant, tool_use, etc.)
        JE->>JE: processSDKMessage → JobOutput
        JE->>State: appendJobOutput(JSONL)
        JE-->>SE: onMessage callback (for events)
    end

    RT-->>JE: Terminal message (result or error)
    deactivate RT

    alt Success
        JE->>State: updateJob(status: completed, summary)
        JE->>State: updateSessionInfo(sessionId)
        JE-->>SE: RunnerResult(success: true)
    else Error / Failure
        JE->>State: updateJob(status: failed, error)
        JE-->>SE: RunnerResult(success: false, error)
    end
    deactivate JE

    SE->>FM: Emit job:completed or job:failed
    SE->>SE: Execute after_run / on_error hooks
    deactivate SE
    deactivate FM

The key participants in this flow are:

  • Trigger: A schedule firing (interval/cron) or a manual herdctl trigger command
  • Scheduler: Polls schedules and checks whether they are due, respecting concurrency limits
  • FleetManager: Top-level orchestrator that wires everything together
  • ScheduleExecutor: Handles the bridge between scheduler triggers and job execution
  • JobExecutor: Manages the full lifecycle of a single job — creating records, streaming output, and updating final status
  • Runtime: The execution backend (Claude Agent SDK or CLI) that actually runs the agent and returns a stream of messages
  • StateManager: Persists job metadata, JSONL output, and session info to .herdctl/
StatusDescription
runningJob is currently executing
completedJob finished successfully
failedJob terminated due to an error
cancelledJob was manually stopped

When a job completes, it records an exit reason explaining why it ended:

Exit ReasonDescription
end_turnJob completed naturally
stop_sequenceJob hit a stop sequence
max_turnsJob reached maximum conversation turns
timeoutJob exceeded its configured time limit
interruptJob was cancelled by user intervention
errorJob failed due to an error
{
"id": "job-550e8400-e29b",
"agent": "bragdoc-coder",
"schedule": "daily-standup",
"status": "completed",
"exitReason": "success",
"sessionId": "sess-a1b2c3d4",
"startedAt": "2024-01-15T09:00:00Z",
"completedAt": "2024-01-15T09:15:32Z",
"output": "~/.herdctl/jobs/job-550e8400-e29b/output.jsonl"
}

Job output is stored in JSONL (JSON Lines) format, where each line is a separate JSON object representing an event during execution:

{"type":"start","timestamp":"2024-01-15T09:00:00Z","message":"Job started"}
{"type":"tool_use","timestamp":"2024-01-15T09:00:05Z","tool":"Read","file":"src/index.ts"}
{"type":"output","timestamp":"2024-01-15T09:00:10Z","content":"Reading file contents..."}
{"type":"tool_use","timestamp":"2024-01-15T09:00:15Z","tool":"Edit","file":"src/index.ts"}
{"type":"complete","timestamp":"2024-01-15T09:15:32Z","exitReason":"success"}
TypeDescription
startJob execution began
outputText output from Claude
tool_useTool invocation
tool_resultTool execution result
errorError occurred
completeJob finished
Terminal window
# View logs for a specific job
herdctl logs --job <job-id>
# View logs for an agent (shows recent jobs)
herdctl logs <agent-name>
# Follow logs in real-time
herdctl logs <agent-name> --follow
# Export to file
herdctl logs --job <job-id> > job-output.log
Terminal window
# Show all agents and their status
herdctl status
# Show specific agent status
herdctl status <agent-name>
Terminal window
# Cancel a running job
herdctl cancel <job-id>

Jobs store their Claude session ID, enabling resume after interruption. This is useful when:

  • Network connectivity was lost
  • The system was restarted during execution
  • You want to continue an agent’s work interactively
Terminal window
# Resume the most recent session
herdctl sessions resume
# Resume by session ID (supports partial match)
herdctl sessions resume <session-id>
# Resume by agent name
herdctl sessions resume <agent-name>

See Sessions for more details on session management and resume capabilities.

Jobs are persisted to disk for history and recovery. See State Management for details on storage backends and configuration.

~/.herdctl/
├── jobs/
│ └── <job-id>/
│ ├── job.json # Job metadata
│ └── output.jsonl # Execution output
└── logs/
└── <agent>/
└── <job-id>.log # Agent-specific logs