Troubleshooting Scheduling Issues
This guide covers common scheduling issues you may encounter when running herdctl agents and how to resolve them.
Quick Diagnostic Commands
Section titled “Quick Diagnostic Commands”# Check current fleet statecat .herdctl/state.yaml
# View scheduler logstail -f .herdctl/logs/scheduler.log
# Check specific agent stateherdctl status my-agent
# Validate agent configurationherdctl validate agents/my-agent.yamlCommon Issues
Section titled “Common Issues”Schedule Not Triggering
Section titled “Schedule Not Triggering”Symptoms: Agent schedule never runs, no jobs created.
Possible Causes:
1. Schedule is Disabled
Section titled “1. Schedule is Disabled”Check if the schedule status is disabled:
agents: my-agent: schedules: check-issues: status: disabled # <- ProblemFix: Set status to idle:
agents: my-agent: schedules: check-issues: status: idle2. Invalid Interval Format
Section titled “2. Invalid Interval Format”The interval string may be malformed:
# Invalid examplesschedules: bad-1: type: interval interval: "5" # Missing unit bad-2: type: interval interval: "5.5m" # Decimals not supported bad-3: type: interval interval: "0m" # Zero not allowed bad-4: type: interval interval: "-5m" # Negative not allowedFix: Use valid format {positive-integer}{unit}:
schedules: good: type: interval interval: "5m" # Valid: 5 minutes3. Missing Interval Field
Section titled “3. Missing Interval Field”For interval type schedules, the interval field is required:
# Missing intervalschedules: broken: type: interval # interval: missing! prompt: "Do something"Fix: Add the interval field.
4. Wrong Schedule Type
Section titled “4. Wrong Schedule Type”The scheduler processes both interval and cron type schedules. However, webhook and chat types are not automatically triggered by the scheduler — they rely on external events (incoming HTTP requests or chat messages).
# These ARE processed by the schedulerschedules: my-interval: type: interval interval: "5m" prompt: "Check for updates" my-cron: type: cron expression: "0 9 * * *" prompt: "Run daily report"
# These are NOT processed by the schedulerschedules: my-webhook: type: webhook # Triggered by incoming HTTP requests prompt: "Handle webhook" my-chat: type: chat # Triggered by chat messages prompt: "Respond to message"Fix: If your schedule is not triggering, verify that you are using type: interval or type: cron. For interval schedules, ensure the interval field is set. For cron schedules, ensure the expression field contains a valid cron expression. Webhook and chat schedules require their respective external trigger mechanisms to be running.
Schedule Stuck in “Running” State
Section titled “Schedule Stuck in “Running” State”Symptoms: Schedule shows status: running but no job is active.
Possible Causes:
1. Process Crashed During Job
Section titled “1. Process Crashed During Job”If herdctl crashed or was killed while a job was running:
agents: my-agent: schedules: stuck-schedule: status: running # Stuck from previous crash last_run_at: "2025-01-19T10:00:00Z"Fix: Reset the schedule status:
agents: my-agent: schedules: stuck-schedule: status: idle # Reset to idle2. Graceful Shutdown Timeout
Section titled “2. Graceful Shutdown Timeout”If shutdown timed out while waiting for jobs:
[scheduler] Shutdown timed out with 1 job(s) still runningFix: Check for orphaned processes and reset state:
# Check for orphaned Claude processesps aux | grep claude
# Reset schedule state# Edit .herdctl/state.yamlSchedule Triggers Too Frequently
Section titled “Schedule Triggers Too Frequently”Symptoms: Jobs run back-to-back without waiting the full interval.
Possible Causes:
1. Multiple Scheduler Instances
Section titled “1. Multiple Scheduler Instances”Running multiple scheduler instances will cause duplicate triggers:
# Check for multiple processesps aux | grep herdctlFix: Stop duplicate instances. Only one scheduler should run.
2. Clock Skew
Section titled “2. Clock Skew”If system time changed or was adjusted, next trigger times may be in the past:
# State shows past timeschedules: my-schedule: next_run_at: "2025-01-19T09:00:00Z" # In the pastFix: The scheduler handles this automatically by triggering immediately when the calculated next time is in the past. This is expected behavior after system sleep or clock adjustments.
Schedule Skipped: At Capacity
Section titled “Schedule Skipped: At Capacity”Symptoms: Scheduler logs show “at max capacity” skip messages.
[scheduler] Skipping my-agent/process-issues: at max capacity (1/1)Causes:
This occurs when:
- A job is already running for this agent
- The agent’s
max_concurrentlimit has been reached
Fix: This is normal behavior. Options:
- Wait: The schedule will trigger once current jobs complete
- Increase capacity: Raise
max_concurrentif appropriate
instances: max_concurrent: 2 # Allow 2 concurrent jobs- Reduce job duration: Optimize agent prompts for faster execution
Schedule Skipped: Already Running
Section titled “Schedule Skipped: Already Running”Symptoms: Schedule logs show “already running” skip messages.
[scheduler] Skipping my-agent/check-issues: already runningCause: The same schedule is currently executing a job.
Fix: This is expected behavior. A schedule can only have one active job at a time. Wait for the current job to complete.
Jobs Fail with Work Source Errors
Section titled “Jobs Fail with Work Source Errors”Symptoms: Jobs start but fail immediately with work source errors.
Possible Causes:
1. GitHub Token Issues
Section titled “1. GitHub Token Issues”Error: Bad credentialsFix: Check your GITHUB_TOKEN environment variable:
# Verify token is setecho $GITHUB_TOKEN
# Verify token has correct permissionsgh auth status2. Repository Access
Section titled “2. Repository Access”Error: Resource not accessible by integrationFix: Ensure the token has access to the configured repository.
3. Label Not Found
Section titled “3. Label Not Found”Error: Label 'ready-for-dev' not foundFix: Create the required labels in your GitHub repository.
Interval Parsing Errors
Section titled “Interval Parsing Errors”Symptoms: Error messages about invalid interval format.
IntervalParseError: Invalid time unit "min" in interval "5min"Fix: Use valid unit abbreviations:
| Valid | Invalid |
|---|---|
5s | 5sec, 5seconds |
5m | 5min, 5minutes |
1h | 1hr, 1hour |
1d | 1day |
State File Corruption
Section titled “State File Corruption”Symptoms: Errors reading or parsing state file.
Error: YAML parsing failedFix:
-
Backup current state:
Terminal window cp .herdctl/state.yaml .herdctl/state.yaml.backup -
Validate YAML syntax:
Terminal window # Check for syntax errorspython -c "import yaml; yaml.safe_load(open('.herdctl/state.yaml'))" -
Reset state if necessary:
Terminal window # Remove corrupted state (schedules will trigger immediately on restart)rm .herdctl/state.yaml
Scheduler Won’t Start
Section titled “Scheduler Won’t Start”Symptoms: Scheduler fails to start with an error.
Possible Causes:
1. State Directory Missing
Section titled “1. State Directory Missing”Error: ENOENT: no such file or directory '.herdctl'Fix: Create the state directory:
mkdir -p .herdctl2. Permission Denied
Section titled “2. Permission Denied”Error: EACCES: permission deniedFix: Check directory permissions:
ls -la .herdctl/chmod 755 .herdctlchmod 644 .herdctl/state.yaml3. Already Running
Section titled “3. Already Running”Error: Scheduler is already runningFix: Stop the existing scheduler instance first.
Debugging Tips
Section titled “Debugging Tips”Enable Debug Logging
Section titled “Enable Debug Logging”Set the log level to debug for more detailed output:
HERDCTL_LOG_LEVEL=debug herdctl startInspect Schedule State
Section titled “Inspect Schedule State”View the raw state file to understand current schedule status:
cat .herdctl/state.yaml | grep -A 10 "schedules:"Trace Scheduler Checks
Section titled “Trace Scheduler Checks”The scheduler logs each check cycle. Look for patterns:
# Count trigger attemptsgrep "Triggering" .herdctl/logs/scheduler.log | wc -l
# View skip reasonsgrep "Skipping" .herdctl/logs/scheduler.log | tail -20Test Interval Parsing
Section titled “Test Interval Parsing”Verify your interval strings are valid:
import { parseInterval } from "@herdctl/core/scheduler";
// Test your intervalsconsole.log(parseInterval("5m")); // 300000 (milliseconds)console.log(parseInterval("1h")); // 3600000Recovery Procedures
Section titled “Recovery Procedures”Reset All Schedule States
Section titled “Reset All Schedule States”To reset all schedules to idle (triggers immediate execution):
# Backup firstcp .herdctl/state.yaml .herdctl/state.yaml.backup
# Remove schedule states (or edit to set all to idle)# Schedules will reinitialize on next scheduler startForce Immediate Trigger
Section titled “Force Immediate Trigger”To force a schedule to trigger immediately:
# Set next_run_at to a past timeagents: my-agent: schedules: force-me: status: idle next_run_at: "2020-01-01T00:00:00Z" # Past dateClear Stuck Jobs
Section titled “Clear Stuck Jobs”If jobs are stuck, you can clear them by:
- Stopping the scheduler
- Resetting schedule states to
idle - Clearing any orphaned processes
- Restarting the scheduler
Performance Tuning
Section titled “Performance Tuning”Reduce Check Frequency
Section titled “Reduce Check Frequency”For large agent fleets, increase the check interval:
const scheduler = new Scheduler({ checkInterval: 5000, // Check every 5 seconds instead of 1 stateDir: ".herdctl",});Optimize Agent Count
Section titled “Optimize Agent Count”The scheduler checks all agents every cycle. For better performance:
- Group related schedules into single agents
- Use appropriate intervals (don’t poll every second if 5 minutes is sufficient)
- Consider multiple scheduler instances for very large fleets
Monitor Resource Usage
Section titled “Monitor Resource Usage”Watch for:
- High CPU from frequent checks
- Memory growth from accumulated state
- Disk I/O from state file updates
# Monitor herdctl resource usagetop -p $(pgrep -f herdctl)Getting Help
Section titled “Getting Help”If you’re still having issues:
- Check logs: Look in
.herdctl/logs/for detailed error messages - Validate configuration: Run
herdctl validateon your agent configs - Review state: Inspect
.herdctl/state.yamlfor inconsistencies - File an issue: Report bugs at the project repository with:
- Your agent configuration (sanitized)
- Relevant log output
- Steps to reproduce