Phase 1 taught the squad to learn from itself. Phase 2 lets it run on every PR without a human watching. A GitHub Actions workflow invokes /review --ci headlessly via claude -p, a bash wrapper parses a structured verdict JSON, posts a PR comment with findings tagged by severity and reviewer, and optionally triggers /review-auto to apply safe fixes back to the PR branch. Cost-budget gate runs before every review; routing cap keeps CI aggressive on cost while preserving deep-mode escalation for security-sensitive diffs. The guiding rule: the squad becomes team infrastructure by existing without being invoked.
-
/review --ci — headless mode with structured verdict
New --ci flag (auto-enabled when CLAUDE_CODE_HEADLESS=1) suppresses the usual markdown narrative and emits a single <squad-verdict-json>...</squad-verdict-json> block per docs/squad-json-schema.md v1. Step 0.4 CI_MODE detection. Step 0.5d.1 CI routing cap downgrades full-mode to thin-mode for cost savings but PRESERVES deep-mode for auth / migration / crypto paths — security matters even in CI. Step 1 + 0.5a fork on GITHUB_BASE_REF or --pr to diff against origin/<base>..HEAD because CI checkouts produce a clean working tree (the original Phase 2 draft had this wrong; the squad caught it on round 1). Step 7.5 emits JSON with authoritative findings parsing rules (section-to-tier mapping, bullet extraction, stable IDs like B1, M1, parse warnings channel). Exit codes aligned to schema: APPROVE/CONDITIONAL/SKIPPED→0, REVISE/BLOCK→1, ABORTED→2.
-
/review-auto --ci --from-verdict-json <path> — CI-mode auto-fix with verdict reuse
Skips the initial /review call when --from-verdict-json supplies a pre-computed verdict, saving the cost of two reviews per auto-fix cycle. Step 3 CI carve-out auto-proceeds when no finding has an UNSAFE class tag; aborts otherwise. Step 6.2 pushes commits back using ${GITHUB_HEAD_REF:-$(git branch --show-current)} — falls back gracefully when the workflow checked out a branch ref instead of a SHA. Step 7.5 emits post-fix verdict JSON with auto-fix augmentation keys: auto_fix_applied, auto_fix_rounds, auto_fix_items_applied, auto_fix_commits, auto_fix_status.
-
docs/squad-json-schema.md — v1 machine-parseable verdict schema
Canonical contract for the <squad-verdict-json> block. Top-level: verdict, summary, routing_mode, routing_override, nando, emily, findings, overturned_findings, chunking, files_reviewed, metadata, reason. SKIPPED / ABORTED envelope shape fully specified with fixed reason enum (classifier-skip-mode, preflight-typecheck-failed, budget-exceeded, and more). Findings carry stable IDs, nullable class tags, severity tiers (blocker / must-fix / recommended / nits / boyscout). Consumer contract explicit: reject unknown schema_version, preserve IDs, never mutate the emitted object. metadata.parse_warnings and metadata.push_error always emitted (never absent) so consumers can rely on presence. Schema evolution policy: additive changes do not bump version; renames or removals do.
-
scripts/squad-pr-review.sh — bash wrapper (600 lines)
Validates config (enum / bool / number) on load. Cap-aware cost pre-estimate (thin $0.15, full $0.60, deep $0.90, 4x multiplier above 30 files). Claude invocation uses set +e / set -e block with separate mktemp stdout + stderr captures — avoids the process-substitution race that was the original design. Python regex extraction of the verdict block (robust across single-line and multi-line sentinels). Five-pattern sed redaction for sk-ant-*, Bearer *, ghp_*, gho_*, ghs_* on every stderr surface that reaches PR comments. PR comment renders verdict with icon + heading + severity-tier findings tables; truncates detail at 600 chars with named-artifact breadcrumb; escapes <details> / </details> tokens in user-supplied finding text to prevent markdown-collapse escape. Follow-up auto-fix comment links back to primary comment via captured URL (strict regex anchored to github.com#issuecomment-<digits>). Schema-version mismatch writes a machine-readable failure JSON to the artifact AND posts a PR comment with expected/actual versions + migration doc anchor.
-
.github/workflows/review-squad.yml — Actions workflow
Triggers on pull_request opened / synchronize / reopened. Concurrency group per PR number with cancel-in-progress: true — rapid pushes cancel stale runs. Skip-check step runs first and sets an output if the HEAD commit subject matches chore(auto-fix)* — breaks the loop where /review-auto pushes would trigger new workflow runs indefinitely. All downstream steps gate on the skip output. Checkout uses ref: github.event.pull_request.head.ref (branch name, not SHA) so the runner is on a non-detached HEAD and auto-fix pushes succeed. Plugin install via claude plugin install Review_Squad with repo-clone fallback. Permissions: contents: write + pull-requests: write, documented with a least-privilege path for repos on review-only mode to tighten to contents: read.
-
.github/squad-budget.yml — per-repo CI config
Five keys: max_cost_per_pr_usd, mode, routing_cap, fail_action_on_revise, comment_on_skip. Conservative defaults (thin cap, review-only, $1.00 ceiling, fail on REVISE, no comment on skip). Inline comments explain the thin is special semantics: thin activates the CI cap that downgrades full but preserves deep; full / deep / skip are force-modes that disable deep-mode escalation. Wrapper rejects any value outside the enum / bool / positive-number shape with a clear error message.
-
commands/ci-wrapper.md — wiring guide
Two-sided install model: target repo commits the 4 infrastructure files; runner installs the squad into ~/.claude/ on every job. Four-step one-time repo setup (curl + gh secret set + budget edit + commit + PR). Four-week rollout sequence (shadow → gate → full-cap → review-and-fix) with concrete advancement criteria. Cost table per routing mode with and without chunking. PR comment anatomy. Troubleshooting section covers the likely failure modes: plugin marketplace unreachable, schema version mismatch, budget exceeded, fork PR permission downgrade. Auto-fix safe-class gate mechanism documented with the HARD-UNSAFE class list (sql-injection, auth, secret, token, jwt, permission, rbac, crypto, password, csrf, xss, ssrf). Local-testing note using act.
-
Decision-gate metric — 3-criterion rollout exit
Before advancing past shadow mode to gate mode, the squad must satisfy ALL three criteria across 5 consecutive PRs: (1) maintainer agrees with Nando’s verdict within ±1 tier on 5/5, (2) median
cost_estimate_usd ≤ 80% of max_cost_per_pr_usd, (3) zero silent failures (no run exits 0 while actually crashed). If any criterion fails on any of the 5, restart the window — the next PR becomes PR 1 of a fresh window, no averaging, no cherry-picking. Same three criteria apply across the next 10 PRs for gate → full-cap advancement.
-
Round 1: 15 genuine findings across all four reviewers
FC REVISE:
set -e + process-substitution race, schema example drift, apt-get precedence bug, plugin install fallback untested, auto-fix loop trigger. Jared REVISE: path-predictable /tmp files, unconditional contents: write permission, unredacted stderr in PR comments. Stevey REVISE: all seven connectivity blockers — classifier file resolution broken in CI (every PR would have skipped), detached-HEAD push failure, /review-auto verdict JSON never consumed, cost pre-estimate ignored routing cap, 65KB comment cap, Emily CI-skipped invisible, </details> injection from finding text. PM Cory APPROVE with 3 carry-forward questions for Nando.
-
Round 2: 1 regression caught
An over-broad
replace_all on ${RAW_OUT}.stderr accidentally matched the STDERR_OUT="${RAW_OUT}.stderr" definition line itself, producing STDERR_OUT="${STDERR_OUT}" — an unset-variable crash under set -u. Both FC and Jared flagged the same line 203 anchor independently. One-line fix: STDERR_OUT=$(mktemp).
-
Round 3: clean across all four reviewers
FC APPROVE (Quality A, Craft Solid). Jared APPROVE (Security/Efficiency/Reuse PASS). Stevey APPROVE (7/7 blockers resolved with file:line evidence). PM Cory APPROVE (plan coverage complete). Nando synthesized APPROVE. Emily CONFIRM.
-
V5.1 backlog cleared into this release
Nando and Emily flagged 4 carry-forward items as Recommended (not blockers). User requested a final iteration loop to clear them before commit: (1) decision-gate metric locked in
ci-wrapper.md, (2) schema-skew wrapper emits machine-readable failure JSON + PR comment with expected/actual versions and migration doc anchor, (3) stderr redaction applied to /review-auto invocation (defense-in-depth), (4) follow-up auto-fix comment links to primary comment via captured URL. Jared + Stevey re-reviewed, both APPROVE. Stevey caught two polish items (3-space indent that would render as a detached code block in GitHub; dangling “prior squad comment” text on URL-capture failure) — both resolved.
-
Meta observation: squad-self-review on meta-work IS worthwhile when the meta-work modifies the squad’s execution path
The user’s standing preference (
feedback_skip_squad_meta.md) was that squad-self-review on meta-work produces no signal. For V5.0 Phase 2, the squad produced 15 real findings in round 1 and caught a regression introduced in round 2. Nando proposed a documented exception: squad-self-review is worthwhile when meta-work modifies the squad’s own execution path (CI runner, wrapper, agent dispatch) because bugs there cascade across every future review. This release captures that exception in the squad’s institutional memory.
-
Phase 3 — Theme D (team shared state)
/squad-sync synchronizes team state (learnings, patterns, codebase map, review history) across a configured remote — git repo, S3 bucket, or HTTP endpoint. Per-user state (Cory’s memory of each developer’s style) stays local; shared state syncs per-push or on a schedule. Three-way merge for pattern files. .review-squad/config.json declares the sync remote + strategy. squad-sync --init migrates single-user installs without data loss. Auth uses existing git credentials for git-backed sync.
-
Phase 4 — Dashboard MVP (vanilla JS)
Local dev server extending agent-chat at
:4001, reusing existing WebSocket lifecycle events. Static export via /squad-dashboard --export. Four sections: multi-project home, chronological timeline (Chart.js), per-reviewer performance stats, pattern library browser with citation graph. Vanilla JS over Svelte (zero build step, matches /ship and changelog HTML patterns).
-
Rollout criterion for downstream repos
Phase 2 does NOT auto-enable on any downstream repo. Adoption requires the 3-criterion decision gate to pass across 5 consecutive PRs on this repo (Corye-CIC/Review_Squad) first. Real-world signal before real-world rollout.