Skip to content

Troubleshooting Scheduling Issues

This guide covers common scheduling issues you may encounter when running herdctl agents and how to resolve them.

Terminal window
# Check current fleet state
cat .herdctl/state.yaml
# View scheduler logs
tail -f .herdctl/logs/scheduler.log
# Check specific agent state
herdctl status my-agent
# Validate agent configuration
herdctl validate agents/my-agent.yaml

Symptoms: Agent schedule never runs, no jobs created.

Possible Causes:

Check if the schedule status is disabled:

.herdctl/state.yaml
agents:
my-agent:
schedules:
check-issues:
status: disabled # <- Problem

Fix: Set status to idle:

agents:
my-agent:
schedules:
check-issues:
status: idle

The interval string may be malformed:

# Invalid examples
schedules:
bad-1:
type: interval
interval: "5" # Missing unit
bad-2:
type: interval
interval: "5.5m" # Decimals not supported
bad-3:
type: interval
interval: "0m" # Zero not allowed
bad-4:
type: interval
interval: "-5m" # Negative not allowed

Fix: Use valid format {positive-integer}{unit}:

schedules:
good:
type: interval
interval: "5m" # Valid: 5 minutes

For interval type schedules, the interval field is required:

# Missing interval
schedules:
broken:
type: interval
# interval: missing!
prompt: "Do something"

Fix: Add the interval field.

The scheduler processes both interval and cron type schedules. However, webhook and chat types are not automatically triggered by the scheduler — they rely on external events (incoming HTTP requests or chat messages).

# These ARE processed by the scheduler
schedules:
my-interval:
type: interval
interval: "5m"
prompt: "Check for updates"
my-cron:
type: cron
expression: "0 9 * * *"
prompt: "Run daily report"
# These are NOT processed by the scheduler
schedules:
my-webhook:
type: webhook # Triggered by incoming HTTP requests
prompt: "Handle webhook"
my-chat:
type: chat # Triggered by chat messages
prompt: "Respond to message"

Fix: If your schedule is not triggering, verify that you are using type: interval or type: cron. For interval schedules, ensure the interval field is set. For cron schedules, ensure the expression field contains a valid cron expression. Webhook and chat schedules require their respective external trigger mechanisms to be running.

Symptoms: Schedule shows status: running but no job is active.

Possible Causes:

If herdctl crashed or was killed while a job was running:

agents:
my-agent:
schedules:
stuck-schedule:
status: running # Stuck from previous crash
last_run_at: "2025-01-19T10:00:00Z"

Fix: Reset the schedule status:

agents:
my-agent:
schedules:
stuck-schedule:
status: idle # Reset to idle

If shutdown timed out while waiting for jobs:

[scheduler] Shutdown timed out with 1 job(s) still running

Fix: Check for orphaned processes and reset state:

Terminal window
# Check for orphaned Claude processes
ps aux | grep claude
# Reset schedule state
# Edit .herdctl/state.yaml

Symptoms: Jobs run back-to-back without waiting the full interval.

Possible Causes:

Running multiple scheduler instances will cause duplicate triggers:

Terminal window
# Check for multiple processes
ps aux | grep herdctl

Fix: Stop duplicate instances. Only one scheduler should run.

If system time changed or was adjusted, next trigger times may be in the past:

# State shows past time
schedules:
my-schedule:
next_run_at: "2025-01-19T09:00:00Z" # In the past

Fix: The scheduler handles this automatically by triggering immediately when the calculated next time is in the past. This is expected behavior after system sleep or clock adjustments.

Symptoms: Scheduler logs show “at max capacity” skip messages.

[scheduler] Skipping my-agent/process-issues: at max capacity (1/1)

Causes:

This occurs when:

  • A job is already running for this agent
  • The agent’s max_concurrent limit has been reached

Fix: This is normal behavior. Options:

  1. Wait: The schedule will trigger once current jobs complete
  2. Increase capacity: Raise max_concurrent if appropriate
instances:
max_concurrent: 2 # Allow 2 concurrent jobs
  1. Reduce job duration: Optimize agent prompts for faster execution

Symptoms: Schedule logs show “already running” skip messages.

[scheduler] Skipping my-agent/check-issues: already running

Cause: The same schedule is currently executing a job.

Fix: This is expected behavior. A schedule can only have one active job at a time. Wait for the current job to complete.

Symptoms: Jobs start but fail immediately with work source errors.

Possible Causes:

Error: Bad credentials

Fix: Check your GITHUB_TOKEN environment variable:

Terminal window
# Verify token is set
echo $GITHUB_TOKEN
# Verify token has correct permissions
gh auth status
Error: Resource not accessible by integration

Fix: Ensure the token has access to the configured repository.

Error: Label 'ready-for-dev' not found

Fix: Create the required labels in your GitHub repository.

Symptoms: Error messages about invalid interval format.

IntervalParseError: Invalid time unit "min" in interval "5min"

Fix: Use valid unit abbreviations:

ValidInvalid
5s5sec, 5seconds
5m5min, 5minutes
1h1hr, 1hour
1d1day

Symptoms: Errors reading or parsing state file.

Error: YAML parsing failed

Fix:

  1. Backup current state:

    Terminal window
    cp .herdctl/state.yaml .herdctl/state.yaml.backup
  2. Validate YAML syntax:

    Terminal window
    # Check for syntax errors
    python -c "import yaml; yaml.safe_load(open('.herdctl/state.yaml'))"
  3. Reset state if necessary:

    Terminal window
    # Remove corrupted state (schedules will trigger immediately on restart)
    rm .herdctl/state.yaml

Symptoms: Scheduler fails to start with an error.

Possible Causes:

Error: ENOENT: no such file or directory '.herdctl'

Fix: Create the state directory:

Terminal window
mkdir -p .herdctl
Error: EACCES: permission denied

Fix: Check directory permissions:

Terminal window
ls -la .herdctl/
chmod 755 .herdctl
chmod 644 .herdctl/state.yaml
Error: Scheduler is already running

Fix: Stop the existing scheduler instance first.

Set the log level to debug for more detailed output:

Terminal window
HERDCTL_LOG_LEVEL=debug herdctl start

View the raw state file to understand current schedule status:

Terminal window
cat .herdctl/state.yaml | grep -A 10 "schedules:"

The scheduler logs each check cycle. Look for patterns:

Terminal window
# Count trigger attempts
grep "Triggering" .herdctl/logs/scheduler.log | wc -l
# View skip reasons
grep "Skipping" .herdctl/logs/scheduler.log | tail -20

Verify your interval strings are valid:

import { parseInterval } from "@herdctl/core/scheduler";
// Test your intervals
console.log(parseInterval("5m")); // 300000 (milliseconds)
console.log(parseInterval("1h")); // 3600000

To reset all schedules to idle (triggers immediate execution):

Terminal window
# Backup first
cp .herdctl/state.yaml .herdctl/state.yaml.backup
# Remove schedule states (or edit to set all to idle)
# Schedules will reinitialize on next scheduler start

To force a schedule to trigger immediately:

# Set next_run_at to a past time
agents:
my-agent:
schedules:
force-me:
status: idle
next_run_at: "2020-01-01T00:00:00Z" # Past date

If jobs are stuck, you can clear them by:

  1. Stopping the scheduler
  2. Resetting schedule states to idle
  3. Clearing any orphaned processes
  4. Restarting the scheduler

For large agent fleets, increase the check interval:

const scheduler = new Scheduler({
checkInterval: 5000, // Check every 5 seconds instead of 1
stateDir: ".herdctl",
});

The scheduler checks all agents every cycle. For better performance:

  • Group related schedules into single agents
  • Use appropriate intervals (don’t poll every second if 5 minutes is sufficient)
  • Consider multiple scheduler instances for very large fleets

Watch for:

  • High CPU from frequent checks
  • Memory growth from accumulated state
  • Disk I/O from state file updates
Terminal window
# Monitor herdctl resource usage
top -p $(pgrep -f herdctl)

If you’re still having issues:

  1. Check logs: Look in .herdctl/logs/ for detailed error messages
  2. Validate configuration: Run herdctl validate on your agent configs
  3. Review state: Inspect .herdctl/state.yaml for inconsistencies
  4. File an issue: Report bugs at the project repository with:
    • Your agent configuration (sanitized)
    • Relevant log output
    • Steps to reproduce