Review Squad — Release Notes

What’s new in V5.0 Phase 1

Date 18 April 2026

Scope Self-improving squad (Theme B) + diff classifier, chunking, incremental review (Theme C)

Files updated 3 new commands (squad-learnings, squad-drift, squad-report), squad-telemetry.js extended for finding-overturned events, nando-review.md adds Overturned Findings tag section, review.md adds Step 0.5 / 1.5 / 3.6, fleet.md notes inheritance

V5.0 moves the squad from a personal tool to team infrastructure. Four themes total: self-improving (Theme B), pre-filter + chunking (Theme C), CI headless (Theme A), team shared state (Theme D), plus a vanilla-JS dashboard in Phase 4. This release ships Phase 1 — Themes B and C — laying the foundations before CI deployment and team scaling. The guiding rule: before distributing the squad to a team, make sure it can teach itself and scale to large diffs without degrading.

Theme B — Self-Improving Squad Auto-promoted patterns, drift detection, false-positive tracking

/squad-learnings — cross-project pattern promotion Reads .review-squad/<project>/learnings.jsonl across the filesystem, clusters findings by keyword affinity to existing canonical patterns in squad-patterns/recurring-blockers.md, and auto-applies new project citations to the existing Seen in: blocks when a pattern appears in 2+ projects with 3+ occurrences. Genuinely new clusters surface as ## Candidates for promotion with citation lists and suggested classification — never auto-written, always human-curated. --dry-run previews every change. --json output for the future dashboard. Logs each run to squad-patterns/.learnings-log.jsonl so /squad-report can show last-run without re-scanning.
/squad-drift — agent vs canonical reference drift Compares each agent file against its canonical reference in agents/_shared/ — 4 canonicals (father-christmas-craft, jared-security-intelligence, nando-intelligence, stevey-design-principles) mapped to 14 consuming agents. Per-section status: SYNCED (verbatim), MODIFIED (adapted, similarity above threshold), DRIFTED (below threshold), MISSING (heading absent). Reports actionable drift with specific fix instructions (“copy X section from Y into Z”). Exit code 1 on actionable drift so CI can gate on it. Scoring is line-set Jaccard — fast, transparent, false-positive-tolerant (adjust threshold via --threshold N).
/squad-report — monthly executive summary Single report combining five signals: invocations & verdicts (from squad-metrics.jsonl), auto-fix effectiveness, false-positive leaders (from finding-overturned events — see below), pattern promotions (from /squad-learnings log), and drift status (invokes /squad-drift --json unless --no-drift). Auto-derives “This-month priorities” from the data — not a placeholder. Trend-vs-prior-period column activates after the second run via .report-log.jsonl self-history.
False-positive tracking — Nando tag + telemetry capture agents/nando-review.md adds a new ## Overturned Findings (telemetry) section requiring exact-format tag lines (- REVIEWER: <agent> | CLASS: <class> | FINDING: <brief>) when Nando dismisses a reviewer's finding as a false positive. hooks/squad-telemetry.js captures these as finding-overturned events. /squad-report aggregates: a reviewer getting overturned on the same class 3+ times flags that agent's prompt for refinement — precise surgical target instead of guessing which pre-flight gate needs tightening.

Theme C — Pre-Filter + Chunking Diff classifier, per-chunk review, per-commit incremental

/review Step 0.5 — diff classifier with four modes Routes every diff to one of four modes before spawning reviewers. deep-mode: migrations, auth, crypto, RBAC, JWT code paths, or commits mentioning security — spawns full squad plus dedicated Jared audit pass. skip-mode: docs-only, under 200 lines, no code surface — exits cleanly with no squad spawn. thin-mode: test-only, lockfile bumps, or 2 small files with no frontend — FC + Jared only. full-mode: default. --mode skip|thin|full|deep overrides the classifier. In CI (CLAUDE_CODE_HEADLESS=1), classification can delegate to claude-haiku-4-5 with a 10-token max.
/review Step 3.6 — chunking for diffs over 30 files Large diffs no longer exhaust orchestrator context. At FILE_COUNT > 30, the review splits into chunks. Monorepo-aware: detects pnpm-workspace.yaml, lerna.json, turbo.json, nx.json, rush.json and groups files by workspace package. Fallback: first-level path segment. Rebalance: re-split any chunk that still exceeds 30. Cap: maximum 8 chunks, merge smallest if over. Each chunk gets its own <shared-files> block and its own roster (re-classified via Step 0.5). Chunks iterate sequentially; within each chunk, reviewers run in parallel. Nando synthesizes with a two-pass protocol: per-chunk verdicts, then cross-chunk pattern detection (systemic issues across 3+ chunks auto-promote to BLOCKER).
/review --incremental — per-commit review with carry-forward When the argument is a commit range (HEAD~3..HEAD or main..HEAD), each commit gets its own review pass with a <prior-findings> block summarizing earlier-commit findings. Reviewers look for regressions (current commit undoing an earlier fix), repeated patterns (same class of mistake across commits), and cross-commit incoherence (conflicting architectural decisions). Aggregate verdict takes the weakest per-commit verdict. Emily's final review runs once at end-state, not per-commit. User is prompted if range exceeds 10 commits.
/fleet — automatic chunking inheritance Fleet's merged-range review (/review <base>..HEAD at Step 6.5b) automatically picks up Step 0.5 classification and Step 3.6 chunking. No fleet-side changes needed — a 50-file fleet merge now chunks cleanly instead of overwhelming the orchestrator.

Decisions locked Research settled before implementation

Output format: markdown + --json flag Mirrors the /squad-metrics pattern shipped in V4.3. Primary interface is human-readable markdown; --json unlocks CI integration (Phase 2) and dashboard (Phase 4) without forcing JSON as the default.
Chunking threshold: 30 files (fixed, not configurable) Sits above the existing 20-file advisory threshold in Step 3.5 — creates a two-tier system: advisory at 20, automatic chunking at 30. One canonical default rather than per-project configuration — reduces surface area and matches the squad's “sensible default, escape hatch available” design pattern. --mode is the escape hatch when the heuristic is wrong.
Classifier model: claude-haiku-4-5 Verified current via Anthropic docs on 2026-04-18. 200k context, $1 / $5 per MTok, ideal for fast pre-filter routing. Interactive runs use orchestrator judgment (cheaper than a Haiku round-trip for a user already in session). CI runs delegate to Haiku as the cost-aware path.

Phase 1 decision gate PASS

Theme B promotes patterns on real telemetry Spot-check across campaign-management (303 learnings), llama.cpp (54), and SubAgents (173) projects. Keyword clusters matched cleanly: Type Safety at 72 hits in 2 projects, Null-Safety at 107 hits in 3 projects, Unimplemented at 23 hits in 3 projects. All three canonical patterns are correctly identified by the clustering heuristic from real labeled data.
Theme C classifier routes 10 of 10 sample diffs correctly Walk-through scenarios: README-only edit (skip), lockfile bump (thin), mixed test + component (full), migration SQL (deep), auth middleware (deep), small refactor (thin), e2e test (thin), 45-file multi-workspace (full + chunking), JWT lib (deep), 250-line changelog (full — conservative but not wrong). All routed per Step 0.5b rules without a miscategorization.

Still pending — V5.0 Phases 2-4 CI headless, team shared state, dashboard

Phase 2 — Theme A (CI + headless) GitHub Actions workflow triggering on PR open/update, claude -p headless wrapper, verdict JSON schema, cost-budget gate that invokes Step 0.5 pre-filter and escalates for approval past threshold. Two modes: review-only (default, conservative — posts verdict comment) and review-and-fix (opt-in — runs /review-auto, commits as chore(auto-fix)). Depends on Theme C chunking shipped here. Decision gate: 5 consecutive PRs on this repo with sensible verdicts before proceeding.
Phase 3 — Theme D (team shared state) Confirmed IN for V5.0. /squad-sync synchronizes team state (learnings.jsonl, patterns.md, codebase-map.md) across configured remote (git, S3, or HTTP). Per-user state (Cory's memory of each developer's style) stays local. Three-way merge for patterns. .review-squad/config.json declares sync remote + strategy. squad-sync --init migrates single-user installs without data loss.
Phase 4 — Dashboard MVP (vanilla JS) Local dev server extending agent-chat at :4001, reusing existing WebSocket lifecycle events. Static export via /squad-dashboard --export. Sections: multi-project home, chronological timeline (Chart.js), per-reviewer performance stats, pattern library browser with citation graph. Vanilla JS confirmed over Svelte — zero build step, matches /ship and changelog HTML patterns.
Breaking changes planned at V5.0 release cut Plugin-only install (deprecate curl). Command namespace consolidated to /squad-* prefix (rename /update-reviewsquad to /squad-update). New reviewSquad top-level key in ~/.claude/settings.json. One-time squad-sync --migrate for legacy .review-squad/ format if Theme D ships — back-compat shim for one version, drop in V5.1.

← V4.3 V5.0 PHASE 1 V5.0 Phase 2 →