DREW

Dependency Remediation & Engineering Worker — autonomous agent that polls Dependabot alerts, bumps the vulnerable dep (fixing breakage), opens a PR, and shepherds it to merge; one open PR at a time. Trunk is live as a supervised prototype; tracks add CI/CD watch, the Adjudicator review bot, and deferred incident-triage.

Status: Prototype live (supervised) Brain: Claude Code harness (headless CLI) Runs on: Mac mini (M4, 16 GB) · Docker Status snapshot: 2026-07-08

#Principles

DREW is a coding agent — use the harness, don't rebuild it#
Remediation means cloning a repo, editing manifests, building, and fixing the breakage a bump introduces — which is the Claude Code loop; rebuilding it on the raw SDK would be reimplementing Claude Code badly. → §2a, §2b
Automating PRs without automating review just moves the bottleneck#
The deliverable is the full path to merge; trunk safety comes from a bounded, revertable change class plus CI + a deterministic verifier, and the Adjudicator track exists to satisfy any required human review. → §2g, §2e, §2d
Throttled, ephemeral, auditable#
One open PR at a time, highest-severity first; each run is a fresh hard-killable container with full transcript to #drew-audit + S3; trust is earned in phases. → §2c, §1

#TODO 1 / 12 done

not started — no PR draft PR open PR / in progress done — merged

Trunk — Dependabot remediation (v1)

The proving ground, staged so merge authority is the last thing granted.

1

Supervisor — poll, rank, throttle, guards#

Polls the Dependabot alerts API, ranks highest-severity-first, enforces the open-PR throttle, dedups, and spawns one worker; prototype is the local CLI authenticating as the ax-drew App, with the prod daemon posture (scheduled polling, Redis guards, Secrets Manager, Slack + S3 trail) still to land.

awaiting sign-off prod daemon posture — scheduled supervisor, Redis guards, Secrets Manager, #drew-audit + S3 trail

#2300 feat(drew): Dependabot remediation prototype
2

Worker cage + egress jail + harness wiring#

Disposable container running claude -p with a constrained tool surface, its only network route the default-deny iron-proxy egress jail; prototype ships the hardened container and mandatory jail in warn mode, with enforce-mode and credential-swap still to land. → §2e

awaiting sign-off egress jail flipped warn → enforce after a clean remediation
awaiting sign-off credential-swap live — worker holds only opaque proxy tokens for GitHub + Anthropic

#2300 feat(drew): Dependabot remediation prototype
Remediation engine — bump + build + fix, dry-run#

Given one alert, the worker resolves the patched version, bumps the manifest/lockfile, builds, and edits source until it compiles — emitting a diff + build result with nothing pushed. → §2d

#2300 feat(drew): Dependabot remediation prototype
4

Supervised live runs — operator-triggered, human-gated#

First live writes: each run is operator-triggered and each merge passes required review; the worker opens a ready PR, arms auto-merge, shepherds CI to green, and exits MERGED | PARKED | BAILED. The live record below is the calibration data; the deterministic out-of-agent verifier must be built and validated against these diffs before phase 5 (§3.6).

awaiting sign-off deterministic out-of-agent verifier built + validated against the live diffs (no false-passes)

#2305 chore(deps): bump axios to 1.16.0
#2306 chore(deps): bump rustls-webpki to 0.103.13
#2307 build(deps): bump js-cookie to 3.0.7
#2308 chore(deps): bump uuid to 11.1.1
#2311 build(deps): bump protocol-buffers-schema to 3.6.1
#2313 build(deps): bump postcss to 8.5.10
#2314 build(deps): bump esbuild to 0.25.0
#2315 build(deps): bump ws to 8.20.1
5

Lights-out — scheduled supervisor, merge on green#

Unattended operation: scheduled supervisor, merge without a per-PR human — either checks-only branch protection for drew/deps/* or the Adjudicator supplying the review. Lands narrow (lockfile/patch first), widening as the record justifies; requires the phase-1 daemon, phase-2 enforce jail, and phase-4 verifier.

CI/CD — pipeline watch & speed

Make rust-test/rust-clippy faster while provably keeping coverage — needs only the cage and Actions: read, not merge authority. Branches from step 2.

C1

Pipeline telemetry — measure before touching#

Read-only: pull per-job timings, queue waits, and cache hit rates from the Actions API, persist the series, and produce a hotspot ranking. No opinion without data.
C2

Optimization PRs — human-merged, coverage-proven#

Propose pipeline speedups, each carrying before/after timing and a deterministic coverage proof that the test/lint set didn't shrink; every CI change is human-merged since the App lacks Workflows: write (§3.17). → §3.18
C3

Regression watchdog — ratchet the wins#

Alert when main's CI wall-clock regresses, bisect to the commit, and file the issue with evidence — the ratchet that makes this ambient infra.

Adjudicator — autonomous PR review

A separate review bot that auto-approves only under very conservative thresholds, deferring to a human on any doubt; authority earned via shadow. Branches from step 2.

A1

Shadow reviewer — advisory comments, no authority#

Reviews every DREW PR (correctness, scope, supply-chain) and posts comments only; verdicts are scored against human outcomes to measure the false-pass rate before any authority is requested. → §2g
A2

Conservative auto-approve — DREW's bounded class only#

A separate App (ax-adjudicator) may submit an approving review only when every threshold passes — bounded diff class, verifier passed, CI green, zero findings, calibrated false-pass rate; any doubt → defer. → §3.14
A3

Widen — other mechanical PR classes, defer-by-default#

Extend to other low-blast-radius mechanical classes one at a time, each with its own shadow calibration; human feature work stays out of scope.

Incident triage — diagnose & draft a fix

The original mission, deferred: draft-only, human-gated, never auto-merged, since untrusted prod logs are an injection surface. Branches from step 2.

T1

Incident triage — diagnose & draft a fix (future)#

Reuses the supervisor/worker/egress/audit stack with read-only senses (incident.io / ClickHouse / Sentry) and a triage pipeline, but inverted: open-ended root cause over untrusted prod logs, so draft-only, human-gated, never auto-merged.

#Design Questions

Auth / billing — subscription or metered API key?#

YesMetered API key. Anthropic's 2026-06-15 change metered subscription headless use at API per-MTok rates, killing the flat-rate edge. The API key then wins on operability, on being a swappable header credential for iron-proxy credential-swap (§2e), and on being ToS-clean with both CLI and Agent SDK. Metered cost means DREW reports its own estimate + realized spend (§3.6).
Autonomy — how far does DREW go unsupervised?#

YesShepherd → auto-merge, earned in stages. DREW opens a ready PR, arms auto-merge, and shepherds CI green. Today (phase 4) two human gates remain (operator trigger, required review); phase 5 removes the trigger and resolves required review via checks-only protection or the Adjudicator (§3.12). Defensible because the class is bounded, CI is the gate, the verifier backstops it, and the throttle caps a bad merge at one git revert.
Egress enforcement — harness sandbox, or network-layer proxy?#

Yesiron-proxy at the container layer is the real boundary (the harness sandbox is hostname-based defense-in-depth). Chosen for default-deny allowlisting, TLS-terminating rules, MCP tools/list filtering, and credential-swap. Built + mandatory in the prototype (py/drew/egress/), in warn mode; allowlist is Anthropic, GitHub, and the registries — flip warn → enforce after a clean run.
GitHub credential — fine-grained PAT or a GitHub App?#

YesGitHub App ax-drew (CARL §3) — built; DREW opens PRs as app/afintech-drew, repo-filtered to architect-xyz/ax. Minimal perms: Contents/Pull requests: write, Checks/Statuses/Actions: read (the shepherd needs the status rollup + failing logs), Dependabot alerts: read, Issues: write, Metadata: read — and not Administration, Workflows, Actions: write, Members, Secrets. Merge is gated by branch protection, not the token (§3.12); withholding Workflows makes Actions-ecosystem bumps out of scope for v1 and constrains the CI/CD track.
Ingest mechanism — poll the alerts API, or a webhook?#

YesPoll the Dependabot alerts API on an interval. Polling needs no inbound endpoint (egress-jail-friendly), is idempotent against the throttle, and a few-minutes lag is irrelevant; the webhook is a later latency optimization.
Per-remediation budget — what are the caps?#

YesMeter + report; wall-clock + attempt cap. DREW logs an up-front estimate and realized per-MTok spend (BARD-style) — reporting, not a hard cap. Runaway guards are time-based: a wall-clock docker kill backstop and a cap on shepherd fix-attempts. A third lever — context-window degradation — starts as a hard-kill at 50% of the window (RULER finds ~50–65% usable), tuned from phase-4 data.
Repo working copy — fresh clone per remediation, or a warm cache?#

YesFresh shallow clone per remediation, destroyed with the container — ephemerality is a security property. git clone --reference is a latency optimization that preserves it. Distinct from the warm CARGO_HOME/target build cache (§2f).
Scope — which bumps does DREW take, and in what order?#

YesWiden by risk class. Unattended merge lands on the safest first — lockfile-only / patch-level bumps (5.1) — then minor, then breaking-change fixes (5.2) as the record earns it. Ecosystem scope is cargo + npm; Actions/workflow bumps are excluded (§3.4). Phase 4 attempts every class since a human approves each merge.
Host — Mac mini or EC2 (like BARD)?#

YesA single Mac mini (M4, 16 GB). EC2 would mirror ax-bard exactly but the Mac mini wins on flat capex (~$799 vs ~$256/mo), M4 build speed, and ownership; EC2 Mac is ruled out. Trade-offs: ops/uptime on us (a physical SPOF) and a Docker-VM-side egress jail. The 16 GB cargo check concern is handled by the one-at-a-time throttle + CI-as-compile-authority (§2f).
Prior art — what's worth lifting from CARL & GOPHER?#

YesMined (§2a). Adopted: alert polling + severity ranking; auto-merge of green bumps; iron-proxy egress + credential-swap + tool-filtering; the container-hardening recipe; the App minimal-permission set + signed commits; the context-window monitor (reframed to 50%); warm cargo/sccache; S3 + #drew-audit; metered-key self-metering; the shared guard/ gate. Promoted: the deterministic verifier (§2d). Dropped: /drew pause. Diverged: the harness, the Mac mini, fixing breaking changes.
Target version — the advisory's first patched version, or latest?#

YesPin to first_patched_version, the minimal bump that clears the advisory. "Latest" maximizes unrelated breaking changes and can pull a newer release with a fresh not-yet-flagged vuln. If the minimal version is yanked or unbuildable, step to the next viable release and note it.
Merge gate — how does auto-merge reconcile with branch protection?#

OpenMerge is whatever branch protection on main allows — no bypass; the worker arms gh pr merge --auto, so where a required human review exists the PR parks until approved (degrading cleanly to supervised). The decision sharpened into the Adjudicator: (a) keep a required reviewer, (b) checks-only protection for drew/deps/*, or (c) let a calibrated bot satisfy the review under conservative thresholds. Start with (a); let A1's shadow record decide between (b) and (c).
Native Dependabot PRs — own the lane, or adopt them?#

OpenGitHub's own security-update PRs would duplicate and race DREW. (a) disable native PRs so DREW is sole remediator on its own drew/deps/* branch, or (b) DREW adopts the existing Dependabot branch. Leaning (a) for a clean ownership boundary; either way DREW reads the same alerts API and the throttle keys on DREW-authored PRs only.
Adjudicator identity — can DREW review its own PRs?#

YesNo — a separate App (ax-adjudicator). GitHub refuses self-approval, so ax-drew can't satisfy a required review on its own PRs; the constraint enforces separation of duties and makes disagreements auditable signal (§2g). Perms: Pull requests: write, Contents/Checks/Statuses/Actions: read — no merge, no workflows.
Adjudicator thresholds — what does "very conservative" mean concretely?#

OpenThe A1 → A2 gate needs numbers. Candidate: auto-approve only when (1) the diff is in a deterministically recognizable mechanical class, (2) the verifier passed, (3) full CI green, (4) the review found zero findings, and (5) the shadow false-pass rate is below target over the recent N ≥ 50 reviews. Open: the value of N, the target rate, calibration staleness, and whether critical-severity bumps always defer.
Adjudicator approval — does a bot review satisfy branch protection, and do we want it to?#

OpenMechanically an App with Pull requests: write can approve and plain "require 1 approval" counts it — but rulesets (Code Owners, restricted reviewers) need auditing first. The policy question is separate: do we want merges gated on a bot approval (c) vs dropping the requirement for the bounded class (b)? Decide after A1 produces a false-pass record.
CI/CD lane permissions — who can edit .github/workflows?#

OpenThe trunk withholds Workflows: write from ax-drew (§3.4), so C2's optimization PRs can't even push a workflow edit. Options: (a) a separate narrowly-granted ax-drew-ci App used only by C2 with every PR human-merged, (b) DREW drafts the diff as an artifact a human applies, or (c) scope C2 to non-workflow speedups only. Leaning (b) to start, (a) if the lane proves out — workflow write shouldn't be quietly reversed.
CI/CD coverage invariant — how do we prove "same coverage, faster"?#

OpenCandidate: a deterministic coverage-set diff — enumerate the executed test set and enabled lint set on main vs the branch and require the diff empty-or-additive as a CI check on every C2 PR. Open: flaky-test quarantine (removing a flaky test is a coverage change), sharding masking order-dependent tests, and whether feature-matrix reductions count as shrinkage (they do).

#Scratchpad

Long-form reference and working notes — mostly write-only, for LLM/agent use. Collapsed by default.

›Scratchpad — reference design & raw notes

#Why DREW is not BARD-shaped

BARD and DREW look like siblings — both LLM bots in the AX stack — but sit on opposite sides of one line: read-only analysis vs. autonomous code change.

	BARD (analyst)	DREW (engineer)
Job	Answer BI questions over Postgres/ClickHouse	Remediate a vulnerable dependency end-to-end
Tool surface	5 narrow read tools	Filesystem, grep, build, `git`, `gh` — open-ended
LLM plumbing	Hand-rolled loop on raw Anthropic SDK	The Claude Code harness (it already is this loop)
Side effects	None — `sql_safety` rejects non-SELECT	Edits manifests, opens a PR, merges on green CI

For BARD's tiny, guarded tool surface a bespoke raw-SDK loop is right. For DREW the tool surface is a coding agent — and that loop, with its context management, permissioning, and sandbox, is what Claude Code already implements. DREW borrows BARD's operational skeleton (supervisor, Redis guards, AWS secrets, container deploy) but its brain is the harness; the only custom code is the orchestration around it.

Prior art — CARL & GOPHER. Two earlier RFCs (PR #1760, PR #1807, both closed unmerged Apr–May 2026) designed the same supervisor + sandboxed-headless-Claude-Code shape for exactly this Dependabot job and deferred incident triage to a separate RFC — now track triage. They independently reached the harness conclusion, worked out the sandbox/egress/budget machinery (mined into §2e, §2f, §3.10), and auto-merged safe bumps. DREW diverges by riding the harness (not a raw loop), running on a Mac mini, and fixing the breaking changes a bump introduces.

#The harness decision

The question isn't "raw SDK vs harness" (the harness wins for a coding agent) but which form:

Approach	Coding loop	Isolation	Billing / ToS
Raw Anthropic SDK (BARD's way)	Build it all by hand	In-process	API key only
Claude Code CLI (`claude -p`)	Built-in	Subprocess — fresh, hard-killable, resource-capped	API key or subscription is ToS-clean
Claude Agent SDK	Built-in (same engine)	In-process (a hang risks the supervisor)	API key only

Two findings, both pointing to the CLI: (1) Billing no longer forces the hand — isolation does. As of Anthropic's 2026-06-15 change, subscription headless use is metered at the same per-MTok rates as the API, so the flat-rate edge is gone (§3.1); DREW runs on a metered API key, ToS-clean with both CLI and Agent SDK, leaving the choice to isolation. (2) Subprocess isolation is a feature for a security-sensitive agent — spawn each remediation as a throwaway, network-jailed, hard-killable unit and docker kill a runaway; dynamic permissions are recovered via PreToolUse hooks plus a scoped token.

Decision. A thin Python supervisor invokes the Claude Code CLI in headless mode, one fresh sandboxed container per remediation, on a metered API key. The raw SDK is rejected; the Agent SDK is held in reserve (it trades away the subprocess isolation the security model leans on).

#Architecture: supervisor / worker

The same split BARD and NATE use — a cheap, long-lived service that does constrained routing, and an expensive, disposable brain that does the open-ended work. The supervisor is dumb and never dies; the worker is smart and always dies.

Dependabot alerts API ──(poll)──┐
                                ▼
┌──────────────────────────────────────────────────────────────┐
│  DREW SUPERVISOR  (long-lived Python service)                 │
│   • poll open alerts; rank by severity → CVSS → age           │
│   • THROTTLE: open DREW deps PRs at cap? → wait this cycle     │
│   • Redis: per-alert lock, concurrency gate = 1, timers       │
│   • dedup; spawn ONE worker for the top group                 │
│   • relay PR link + merged/parked/bailed to #drew-audit       │
└───────────────────────────────┬──────────────────────────────┘
                                │ docker run --rm (fresh, jailed)
                                ▼
┌──────────────────────────────────────────────────────────────┐
│  DREW WORKER  (ephemeral container = `claude -p` harness)     │
│   fresh shallow clone @ main                                  │
│   bump → just rs/format → cargo check → fix call-site breakage│
│   verifier → gh pr create → arm auto-merge → /shepherd        │
│   exits DREW_STATUS: MERGED | PARKED | BAILED                 │
│   egress: allowlist proxy (Anthropic · GitHub · crates/npm)   │
└──────────────────────────────────────────────────────────────┘

The supervisor holds all durable state and secrets; the worker gets a narrow slice for one remediation and is destroyed (--rm). The worker sets its own stop — /shepherd has no natural end, so it exits with a parseable DREW_STATUS and that exit is the event the supervisor reacts to; on PARKED it schedules drew resume, re-attaching in media res. Throttle, concurrency gate, timers, and dedup live in Redis in prod (BARD's guard/).

#The alert → merged-PR pipeline

One alert group, one worker, one PR. The agent's leverage is steps 4 and 6 — fixing the breakage a bump introduces, and driving CI green; a script handles the rest.

#	Step	What happens
1	Resolve	Read the alert; decide direct vs transitive; target is the advisory's `first_patched_version` (§3.11), not the newest release.
2	Branch	`drew/deps/<crate>-<ver>` off `main`.
3	Bump	Direct → edit `Cargo.toml` then `cargo update --precise`; transitive → lockfile only. `just rs/format`.
4	Build & fix	`cargo check` (warm cache, §2f); if the bump broke call sites, edit source until it compiles. Real-redesign bumps hit the ceiling → bail.
5	Verify	(deterministic, out-of-agent) diff touches only manifest/lockfile + plausible source; no surprise dep or secret; build passed.
6	PR & shepherd	`gh pr create` (ready), arm auto-merge, `/shepherd`: watch CI, fix failures (≤ N), push. Exit MERGED / PARKED / BAILED.
7	Report	One line to `#drew-audit`: alert, CVE/GHSA, bump, PR link, outcome.

Breaking-change ceiling. DREW fixes call-site breakage but does not chase a major version needing real redesign — it leaves the PR open with a comment and escalates rather than forcing a bad merge.

Native Dependabot. GitHub's own security-update PRs would duplicate and race DREW; disable them (or have DREW adopt the existing branch). §3.13 — open.

Deterministic verifier — required before lights-out (GOPHER §5.8). Once no human reviews before merge, the out-of-agent checker is the last gate: the agent can be confidently wrong, a dumb checker cannot. Built and validated in phase 4 against DREW's own diffs before any unattended merge.

#Security model & blast radius

An autonomous agent with shell + gh write that merges its own PRs is a real blast-radius question. The injection surface is smaller than the incident lane (dependency metadata, not prod logs) but the merge authority is larger; defenses are layered.

Threat	Control
Exfiltration / C2	Network-layer egress allowlist via iron-proxy — default-deny MITM proxy permitting Anthropic, GitHub, and the registries only; TLS-terminating (path-scoped rules), filters MCP `tools/list`. Built + mandatory in the prototype, ships in warn mode then enforce.
Unreviewed code on `main`	Safety is scope + gate: bounded change class, full CI as the gate, the deterministic verifier before merge (§2d), and the open-PR throttle so at most one unattended merge is in flight — trivially `git revert`-able. Merge gated by branch protection, not the token.
Malicious upstream package	Pin to `first_patched_version`, never latest; bumped code only executes in CI's sandbox; the egress jail blocks a hostile build script from phoning home. New transitive deps flagged by the verifier + Adjudicator (§2g).
Prompt injection via metadata	Bounded token, egress jail, ephemeral clone, a system-prompt rule to treat fetched data as evidence, never instructions, and iron-proxy credential-swap — the worker holds only opaque proxy tokens for GitHub and the Anthropic key, useless if exfiltrated.
Runaway loop / slow CI	Wall-clock `docker kill` + a cap on shepherd attempts (§3.6); the normal stop is the worker's `DREW_STATUS` exit. The throttle means only one worker.
Container escape	Hardened container (CARL §4): `--rm`, non-root, drop caps + `no-new-privileges`, `--init`, cpu/mem limits, dedicated bridge. Still to land: read-only rootfs, tmpfs `noexec`, no `docker.sock`. Creds via `--env-file` the supervisor `unlink`s after start.
Degraded loop	Context-window monitor (GOPHER §9.5): parse stream-json `usage`, hard-kill at 50% of the window to start (conservative edge of RULER's band — §3.6); phase 4 logs context-% vs outcome to tune it.
Lingering state / audit	Fresh `--rm` container + clone per run; every tool call (`PostToolUse`) + gzipped transcript lands under `.drew-audit/` (S3 + `#drew-audit` in prod).

Don't rely on the harness sandbox for the network boundary. Its filter is hostname-based and doesn't inspect TLS. The trustworthy wall is iron-proxy at the container layer with an explicit allowlist + credential-swap, done Linux-side in the Docker VM.

Why the API key strengthens the jail. An earlier Max OAuth credential wasn't a swappable header key, so credential-swap couldn't cover the Anthropic side. A metered ANTHROPIC_API_KEY (§3.1) lets iron-proxy swap the x-api-key header too — both creds become opaque proxy tokens.

#Deployment on the Mac mini

Borrows ax-bard's software story (Tailscale + AWS Secrets Manager + Docker) but on a single Mac mini (M4, 16 GB) rather than EC2 (§3.9).

Runtime: Docker Desktop; supervisor is always-on, workers are docker run --rm siblings on an isolated bridge routed through the egress proxy.
Auth: a metered API key injected into the worker, from Secrets Manager in prod; swappable, so iron-proxy can credential-swap it (§3.1, §2e).
Secrets: loaded from AWS Secrets Manager at startup; the GitHub App creds mint a per-remediation installation token.
Admin: Tailscale; Lifecycle: launchd or restart-always container.
Build cache: warm CARGO_HOME + shared target/ via sccache (GOPHER §7), keyed on (toolchain hash, Cargo.lock hash) — note a bump moves Cargo.lock by design, so DREW pays a partial rebuild each fix.

Why a Mac mini is fine. The brain is the harness talking to Anthropic; the box only clones, runs cargo check, and shells out to git/gh.

Watch the 16 GB under Docker. cargo check on the ax workspace in the Docker Linux VM (give it ~12 GB) is the one memory-heavy local act — one worker fits, two would thrash. The open-PR throttle pins concurrency at one, and CI is the real compile authority, so local cargo check is a best-effort pre-flight. Step to 24 GB only for parallel workers.

#The Adjudicator direction — moving the bottleneck, not hiding it

DREW solved "who does the mechanical work" and exposed the next constraint: who reviews it. Two bad answers bracket the design space — delete the review gate (unacceptable) and keep a human on every bump forever (the bottleneck DREW exists to remove). The adjudicator is the third: a second, independent agent whose only job is review. Three commitments:

Defer-by-default. The contract is asymmetric — a wrong defer costs one human review, a wrong approve lands unreviewed code on main. It approves only when every check passes (bounded class, verifier, CI green, zero findings, calibrated false-pass rate) and any doubt produces a "deferred to human" comment.
Separation of duties. A different App identity (ax-adjudicator) — GitHub refuses self-approval, and author and judge share no token, container, or transcript (§3.14); disagreement is signal.
Calibration before authority. Ships as a shadow reviewer (A1) scored against human outcomes; promotion needs a measured false-pass rate, not a vibe.

Scope grows as trust did on the trunk: DREW's own drew/deps/* PRs first, then other mechanical classes (A3); human feature work stays out of scope.

#The CI/CD direction — ambient leverage

rust-test and rust-clippy sit on every PR's critical path — including DREW's own shepherd loop — so minutes shaved there compound. Shaped like the trunk's trust ladder:

Measure first (C1). Read-only Actions-API telemetry; no optimization without a baseline and hotspot ranking.
Coverage is the invariant (C2). Every optimization PR carries a machine-checkable proof that the executed-test and enabled-lint sets didn't shrink (§3.18). "Faster because it does less" is a regression in disguise.
Humans merge pipeline changes — structurally. Workflow definitions gate the whole repo, and the App deliberately lacks Workflows: write, so the cage cannot push a workflow edit (§3.17 — open).
Then ratchet (C3). A watchdog that notices main regressing, bisects, and files the issue — the "monitor" half, running forever.