# Fleet Coordination System

**Replaced:** Markdown Kanban + 5 shared docs (17KB/cycle)  
**New:** JSON state files + CLI scripts (~1.5KB/cycle)

## Architecture

```
/home/ccuser/shared/
├── fleet-state.json       # Single source of truth (~2KB)
├── locks/                 # Task claim files (auto-cleaned)
├── logs/                  # Stall detector logs
├── scripts/
│   ├── fleet-cli.js       # Unified CLI (agents use this)
│   ├── fleet-update.js    # Update agent/task/alert state
│   ├── agent-startup.js   # Fleet briefing for agent boot
│   └── stall-detector.js  # Cron: detect stalls, wake agents
├── ROADMAP.md             # Read ONCE per session (not every heartbeat)
├── LESSONS.md             # Read ONCE per session (not every heartbeat)
└── COMPANY.md             # Read ONCE per session (not every heartbeat)

/home/ccuser/{agent}/
├── status.json            # Agent's current state (updated every heartbeat)
├── queue.json             # Agent's personal task queue
├── HEARTBEAT.md           # Updated with fleet coordination protocol
└── SOUL.md                # Identity (read once per session)
```

## Agent Heartbeat Protocol

Every agent follows this 5-step cycle:

1. **Read fleet state** → `node /home/ccuser/shared/scripts/fleet-cli.js briefing <agent> --brief`
2. **Read own queue** → `queue.json` in workspace
3. **Do the work** → Execute queue tasks or domain tasks
4. **Update status** → `node /home/ccuser/shared/scripts/fleet-update.js <agent> --status active --task "..."`
5. **Report issues** → `fleet-cli.js alert/decide <agent> "..."`

## Fleet CLI Commands

```bash
# Fleet overview
node scripts/fleet-cli.js status

# Agent briefing (full or compact)
node scripts/fleet-cli.js briefing <agent>
node scripts/fleet-cli.js briefing <agent> --brief

# Update agent status
node scripts/fleet-cli.js update <agent> --status active --task "task-name" --progress 0.5
node scripts/fleet-cli.js update <agent> --status idle
node scripts/fleet-cli.js update <agent> --status blocked --blocked-on "reason"

# Task management
node scripts/fleet-cli.js claim <agent> <task-id>     # Claim + lock
node scripts/fleet-cli.js complete <agent> <task-id>   # Complete + unlock + go idle

# Alerts and decisions
node scripts/fleet-cli.js alert <agent> "Something broke"
node scripts/fleet-cli.js decide <agent> "Need Michael to decide X"

# Lock management
node scripts/fleet-cli.js lock <agent> <task-id>
node scripts/fleet-cli.js unlock <task-id>

# Stall detection
node scripts/fleet-cli.js detect                       # Dry check
node scripts/fleet-cli.js detect --auto-wake           # Check + wake stalled agents
```

## Stall Detection (Automated)

Runs every 5 minutes via cron:
- Agent not updated in >15 min → flagged as stalled
- Task not progressed in >30 min → flagged
- Expired locks (>2h) → auto-cleaned
- Stalled agents → auto-woken via HTTP bridge

Logs: `/home/ccuser/shared/logs/stall-detector.log`

## Token Savings

| Item | Old (Markdown) | New (JSON) |
|------|---------------|------------|
| Shared context/cycle | ~17KB (5 files) | ~1.5KB (fleet-state.json) |
| Agent startup | 5 file reads | 2 file reads (fleet-state + queue) |
| Stall detection | Manual Rivet scan | Automated cron |
| Task tracking | Kanban markdown | fleet-state.json tasks array |

## Migration

- Old `KANBAN.md` symlink preserved but deprecated
- Old `COMPANY.md`, `ROADMAP.md`, `LESSONS.md` moved to read-once-per-session
- All 6 specialist agent HEARTBEAT.md files updated with new protocol
- Rivet's HEARTBEAT.md updated with fleet-state.json coordination
