# 0.1% Quality Audit Upgrade Plan

**Status:** PARKED
**Phase 1:** COMPLETE (API + External tests implemented)
**Phases 2-4:** PENDING
**Last Updated:** 2026-01-20

---

## Current State → Target State

| Aspect | Current (25/100) | Target (90+/100) |
|--------|------------------|------------------|
| Test Type | DB queries only | API + External + Journeys + Vitals |
| Frequency | Daily 6am | 15-min (critical), hourly (extended) |
| External Services | Config checks | Actual reachability tests |
| Business Metrics | None | SMS delivery %, call outcomes, speed-to-lead |
| Frontend | None | Core Web Vitals + Sentry |
| Alerting | Score <50 only | Graduated thresholds per category |

---

## Phase 1: API & External Service Tests - COMPLETE

### 1.1 Add API Endpoint Tests
Add new `api` category to `src/services/qualityAudit.js`:
- Health endpoint responds
- Leads API responds
- Dashboard API responds
- Call List API responds
- SMS Templates API responds

### 1.2 Add External Service Reachability Tests
Add new `external` category:
- **Twilio**: Fetch account status (verifies credentials work)
- **OpenAI**: Quick GPT-4o-mini ping ("Reply OK")
- **Deepgram**: Config validation
- **Supabase Realtime**: Subscribe/unsubscribe test
- **Slack**: Webhook config check

### 1.3 Service Token for Internal Calls
- Add `AUDIT_SERVICE_TOKEN` env var
- Bypass auth for audit service calls

### Files to Modify
- `src/services/qualityAudit.js` - Add `runAPITests()`, `runExternalServiceTests()`
- `src/middleware/auth.js` - Add service token bypass
- `.env` - Add `AUDIT_SERVICE_TOKEN`

---

## Phase 2: Business Metrics & Journeys - PENDING

### 2.1 Business Metric Tests
Add new `business` category:
- SMS delivery rate (last 24h) - target >90%
- Call outcome logging rate - target >80%
- Speed-to-lead average - target <30 min
- Conversion rate tracking
- Sequence processing health (stuck enrollments)

### 2.2 Synthetic Journey Tests
Add new `journeys` category:
- Lead CRUD journey (create → read → soft delete)
- SMS template rendering flow
- Call list generation performance (<2s)
- AI service readiness check

### Files to Modify
- `src/services/qualityAudit.js` - Add `runBusinessMetricTests()`, `runSyntheticJourneyTests()`

---

## Phase 3: Frontend Quality - PENDING

### 3.1 Core Web Vitals
- Install `web-vitals` package
- Create `admin/src/utils/webVitals.js` - collect LCP, INP, CLS, FCP, TTFB
- Send metrics to `/api/analytics/vitals`
- Store in `web_vitals` table

### 3.2 Frontend Error Tracking
- Install `@sentry/react` (optional, free tier)
- Create `admin/src/utils/sentry.js`
- Enhance ErrorBoundary to report to Sentry + backend
- Store errors in `frontend_errors` table

### 3.3 Frontend Quality Tests
Add new `frontend` category:
- LCP performance (avg <2500ms)
- INP performance (avg <200ms)
- CLS performance (avg <0.1)

### Files to Create
- `admin/src/utils/webVitals.js`
- `admin/src/utils/sentry.js`

### Files to Modify
- `admin/package.json` - Add `web-vitals`, `@sentry/react`
- `admin/src/main.jsx` - Initialize vitals + Sentry
- `admin/src/components/ErrorBoundary.jsx` - Add Sentry reporting
- `src/routes/analytics.js` - Add `/vitals`, `/error` endpoints
- `src/services/qualityAudit.js` - Add `runFrontendTests()`

### Migration SQL
```sql
CREATE TABLE IF NOT EXISTS web_vitals (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  metric_name TEXT NOT NULL,
  metric_value NUMERIC NOT NULL,
  rating TEXT,
  page_path TEXT,
  recorded_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS frontend_errors (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  message TEXT NOT NULL,
  stack TEXT,
  page_path TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
```

---

## Phase 4: Frequent Checks & Unified Alerts - PENDING

### 4.1 Frequent Check Scheduler
- 15-min: Critical + API tests
- Hourly: External service tests
- Daily 6am: Full audit (all categories)

### 4.2 Unified Alerting via `intelligence_alerts` Table
Use existing table structure instead of separate system:
```javascript
// Insert failed test as alert
await supabase.from('intelligence_alerts').insert({
  alert_type: 'quality_audit',      // or 'api_health', 'external_service', etc.
  severity: 'critical',             // critical, high, medium, low
  title: 'Twilio API Unreachable',
  description: 'Twilio account fetch failed during quality audit',
  evidence: {
    test_name: 'Twilio Reachable',
    category: 'external',
    error: 'HTTP 401 Unauthorized',
    audit_score: 85,
    timestamp: '2026-01-20T06:00:00Z'
  },
  suggested_action: 'Check Twilio credentials in Railway environment variables'
});
```

### 4.3 Alert Types for Quality Audit
| alert_type | Triggers |
|------------|----------|
| `quality_audit_critical` | Critical system failure (DB, auth) |
| `quality_audit_api` | API endpoint not responding |
| `quality_audit_external` | External service unreachable |
| `quality_audit_business` | Business metric below threshold |
| `quality_audit_frontend` | Web vitals degraded |

### 4.4 Severity Mapping
```javascript
SEVERITY_MAP = {
  critical: 'critical',  // DB down, auth broken
  api: 'high',           // API endpoints failing
  external: 'high',      // Twilio/OpenAI down
  business: 'medium',    // Metrics degraded
  frontend: 'low'        // Vitals slow
}
```

### 4.5 Deduplication
- Check for existing unresolved alert with same `alert_type` + `title`
- If exists and <30 min old, skip creating duplicate
- Update `evidence` with latest failure if recurring

### 4.6 Auto-Resolution
- When test passes after previous failure, auto-resolve alert:
```javascript
await supabase.from('intelligence_alerts')
  .update({
    resolved_at: new Date().toISOString(),
    resolution_notes: 'Auto-resolved: Test now passing',
    was_true_positive: true
  })
  .eq('alert_type', 'quality_audit_external')
  .eq('title', 'Twilio API Unreachable')
  .is('resolved_at', null);
```

### 4.7 Audit History API
- `GET /api/jobs/audit-history?days=7`
- `GET /api/jobs/audit-latest`
- Alerts visible in unified `/api/alerts` endpoint (existing)

### Files to Create
- `src/jobs/frequentAudit.js`

### Files to Modify
- `src/jobs/index.js` - Add 15-min, hourly intervals
- `src/services/qualityAudit.js` - Add thresholds, alert creation
- `src/routes/jobs.js` - Add history endpoints

---

## Final Test Categories (8 total)

| Category | Weight | Tests | Frequency |
|----------|--------|-------|-----------|
| 🚨 Critical | 3x | DB, tables, auth | 15-min |
| 🔌 API | 2x | Endpoint health | 15-min |
| 🌐 External | 3x | Twilio, OpenAI, Deepgram | Hourly |
| ⚡ Core | 2x | Call list, search, SMS | Daily |
| 🤖 AI | 2x | Copilot, patterns, intel | Daily |
| 📊 Business | 2x | Delivery %, outcomes, speed | Daily |
| 🧪 Journeys | 2x | CRUD, templates, flows | Daily |
| 📱 Frontend | 1x | LCP, INP, CLS | Daily |

---

## Verification Plan

### Phase 1 Verification
1. Run `POST /api/jobs/qualityAudit`
2. Check Slack shows API + External categories
3. Verify external service tests actually ping services
4. Confirm score includes new weights

### Phase 2 Verification
1. Check business metrics match manual calculations
2. Verify journey tests clean up test data
3. Confirm no false positives on business metrics

### Phase 3 Verification
1. Build frontend, load pages
2. Check `web_vitals` table receives data
3. Trigger error, verify Sentry + backend capture
4. Run audit, verify frontend tests execute

### Phase 4 Verification
1. Deploy, wait 15 min, check critical audit ran
2. Manually fail a test, verify Slack alert
3. Verify cooldown prevents duplicate alerts
4. Check `/api/jobs/audit-history` returns data

---

## Dependencies

| Package | Location | Size | Required |
|---------|----------|------|----------|
| `web-vitals` | frontend | ~3KB | Yes |
| `@sentry/react` | frontend | ~20KB | Optional |

---

## Environment Variables

```
AUDIT_SERVICE_TOKEN=<generate-secure-token>
VITE_SENTRY_DSN=<optional-sentry-dsn>
```

---

## Risk Mitigation

1. **Incremental**: Each phase builds on previous, existing audit unchanged until validated
2. **Fallbacks**: All new tests have `warning: true` fallback if data missing
3. **No breaking changes**: New categories added, old categories preserved
4. **Cooldown**: Alert spam prevention built-in
5. **Optional Sentry**: Works without it, just loses external error tracking

---

## Resume Instructions

To continue this work:
1. Read this plan
2. Start Phase 2: Add `runBusinessMetricTests()` and `runSyntheticJourneyTests()` to `src/services/qualityAudit.js`
3. Add `business` and `journeys` categories to `AUDIT_CATEGORIES`
4. Update `runFullAudit()` to include new categories