# Context Lifecycle Management for Always-On LLM Agents
*Cog's design for optimal context management across 8 continuously running agents*

## Problem Analysis
Eight agents, 100K-200K token windows, continuous operation, 80% routine noise accumulation. Builder just hit 104% and needed emergency reset. Need automated, seamless, never-fail context management.

## Solution: Layered Context Architecture

### 1. Core File Structure
```
/home/ccuser/{agent}/context/
├── permanent/           # Never expires
│   ├── IDENTITY.md     # Agent's core identity
│   ├── RULES.md        # Critical behavior rules
│   ├── DECISIONS.md    # Key decisions with dates
│   └── SPECS.md        # Technical specifications
├── session/            # Current session data
│   ├── active.md       # Current context state
│   ├── stats.md        # Token usage tracking
│   └── checkpoints/    # Recovery points
├── archive/            # Compressed history
│   ├── 2026-02-17.gz   # Daily compressed logs
│   └── summaries/      # AI-generated summaries
└── temp/               # Auto-expires
    ├── scratchpad.md   # Working notes
    └── debug/          # Debug output
```

### 2. Token Monitoring & Thresholds
```bash
# Monitor every heartbeat
current_tokens=$(echo "$session_context" | wc -w | awk '{print $1*1.3}')  # Rough token count
context_limit=180000  # 90% of 200K window

if [ $current_tokens -gt $context_limit ]; then
    trigger_compression_cycle
fi
```

**Thresholds:**
- **70% (140K):** Warning — start pruning temp files
- **80% (160K):** Caution — compress routine logs  
- **90% (180K):** Critical — full context reset with preservation

### 3. Content Classification System
Every context item gets classified for retention:

**PERMANENT (never delete):**
- Agent identity and core rules
- Strategic decisions with rationale
- Technical specifications and schemas
- Critical error lessons
- User preferences and constraints

**IMPORTANT (compress, don't delete):**
- Completed tasks and outcomes
- Research findings with sources  
- Configuration changes with reasons
- Performance metrics and patterns

**ROUTINE (auto-expire):**
- Heartbeat confirmations
- Status checks without issues
- Debug output older than 24h
- Temporary calculations
- File listings and directory scans

**NOISE (immediate delete):**
- Tool call confirmations
- "HEARTBEAT_OK" responses
- Duplicate status messages
- Empty function outputs

### 4. Compression Algorithms

**Smart Summarization:**
```python
def compress_routine_context(content):
    # Identify patterns
    heartbeats = extract_pattern(content, r"HEARTBEAT_OK|Heartbeat.*OK")
    status_checks = extract_pattern(content, r"Status:.*OK|System.*healthy")
    
    # Compress to single line
    summary = f"Routine: {len(heartbeats)} heartbeats, {len(status_checks)} status checks, all OK"
    
    # Keep only anomalies
    anomalies = extract_anomalies(content)
    
    return summary + "\nAnomalies: " + anomalies if anomalies else summary
```

**Decision Preservation:**
- Extract decision points: "I decided to..." "The approach is..." 
- Keep rationale: "Because..." "This was chosen over X because..."
- Preserve outcomes: "Result was..." "This led to..."

### 5. Seamless Reset Protocol

**Pre-Reset Checkpoint:**
```bash
# Create recovery checkpoint
cat > /home/ccuser/{agent}/context/checkpoints/$(date +%Y%m%d-%H%M).md << EOF
# Recovery Checkpoint - $(date)

## Current State
- Working on: $(tail -10 session/active.md | grep -E "(Task|Working|Currently)")
- Recent decisions: $(grep -E "decided|chosen|approach" session/active.md | tail -5)
- Next steps: $(grep -E "Next|TODO|Plan" session/active.md | tail -3)

## Context Stats
- Tokens: $current_tokens
- Reset reason: Context limit reached
- Preservation: $(ls permanent/ | wc -l) permanent files saved

## Resume Instructions
$(cat permanent/IDENTITY.md | head -5)
EOF

# Gateway restart with context reload
gateway restart
```

**Post-Reset Recovery:**
1. Load permanent files first
2. Read latest checkpoint
3. Reconstruct session state
4. Confirm continuity

### 6. Automated Rules Engine

**File-based automation via cron:**
```bash
# /etc/cron.d/context-management
*/30 * * * * ccuser /home/ccuser/scripts/context-monitor.sh
0 2 * * * ccuser /home/ccuser/scripts/daily-archive.sh
0 3 * * * ccuser /home/ccuser/scripts/cleanup-temp.sh
```

**Context monitor script:**
```bash
#!/bin/bash
for agent in rivet builder susan harper sentinel radar herald cog; do
    cd /home/ccuser/$agent
    
    # Check token usage
    tokens=$(find . -name "*.md" -exec wc -w {} \; | awk '{sum+=$1} END {print sum*1.3}')
    
    if [ $tokens -gt 160000 ]; then
        echo "Agent $agent at $tokens tokens - compressing"
        ./scripts/compress-context.sh
    fi
    
    if [ $tokens -gt 180000 ]; then
        echo "Agent $agent CRITICAL - forcing reset"
        ./scripts/emergency-reset.sh
    fi
done
```

### 7. Decision Logic Matrix

| Content Type | <24h | 24h-7d | 7d-30d | >30d | Action |
|-------------|------|--------|--------|------|---------|
| Identity/Rules | KEEP | KEEP | KEEP | KEEP | Permanent |
| Decisions | KEEP | KEEP | COMPRESS | COMPRESS | Archive |
| Tasks | KEEP | COMPRESS | SUMMARIZE | DELETE | Progressive |
| Heartbeats | DELETE | DELETE | DELETE | DELETE | Noise |
| Debug | KEEP | DELETE | DELETE | DELETE | Temp |
| Research | COMPRESS | COMPRESS | ARCHIVE | ARCHIVE | Value-based |

### 8. Recovery Validation

**Post-reset checklist:**
- [ ] Agent responds with correct identity
- [ ] Remembers current project context  
- [ ] Recalls recent key decisions
- [ ] Maintains behavioral consistency
- [ ] Can access permanent files
- [ ] Knows next planned actions

**Validation script:**
```bash
# Test agent continuity after reset
echo "Who are you?" | gateway send $agent
echo "What's your current main task?" | gateway send $agent  
echo "What was the last decision you made?" | gateway send $agent
```

## Implementation Priority

**Phase 1 (Immediate):**
- Create permanent/ directories for all 8 agents
- Deploy token monitoring scripts
- Set up emergency reset procedures

**Phase 2 (This week):**  
- Build compression algorithms
- Automate cleanup routines
- Test recovery procedures

**Phase 3 (Next week):**
- Optimize for agent-specific patterns
- Add predictive compression
- Full automation deployment

This system ensures no agent ever hits limits unexpectedly, preserves critical context across resets, and maintains operational continuity 24/7.