ADR-058: Runner Pool Segmentation and Verification Capacity Governance

Field	Value
Decision ID	ADR-058
Initiative	unsorry CI scalability and fork-safe verification
Proposed By	unsorry maintainers
Date	2026-06-16
Status	Proposed

Context

ADR-049 makes unsorry safe for untrusted contributors by keeping the load-bearing soundness verdict on a project-controlled central re-check. Issue #1206 then shows the next constraint: non-contributor fork support and larger agent fleets can make verification capacity the bottleneck even when soundness is already safe.

The current Gate A workflow already has the useful split:

cheap detection and aggregation jobs run on GitHub-hosted ubuntu-latest,
trusted Lean verification jobs run on namespace.so profiles,
prepare/build work routes to namespace-profile-unsorry-prepare,
axiom audit work routes to namespace-profile-unsorry-audit,
kernel replay work routes to namespace-profile-unsorry-replay.

That split should become an explicit policy. GitHub-hosted runners are useful as cheap elastic intake capacity, but they give less control and operational visibility than the namespace lane. Namespace runners provide the better surface for trusted Lean verification because the project controls the profile, cache volume, sizing, and logs around the verifier boundary.

As of this decision, the operating model is:

namespace-profile-unsorry-prepare: small prepare/build profile for cache warming, goal builds, statement bindings, and archive package validation.
namespace-profile-unsorry-audit: axiom-audit profile sized for one resident mathlib image.
namespace-profile-unsorry-replay: replay profile sized for leanchecker and full-replay headroom.
GitHub-hosted runners: cheap actions, protocol checks, docs checks, PR intake, labels, generated-board refreshes, and final aggregation.

WH(Y) Decision Statement

In the context of unsorry scaling from trusted same-repo agents toward fork contributors, volunteer agents, and more parallel PRs,

facing the fact that a single PR can consume several minutes of runner time, and that letting fork, agent, generated-artifact, and trusted verifier jobs compete in one capacity pool would create queue starvation and CI denial-of- service pressure,

we decided for explicit runner pool segmentation and verification capacity governance: GitHub-hosted runners are the cheap/intake lane for low-trust, low-cost, and non-Lean work; namespace.so runners are the trusted verification lane for Gate A and any future central re-check that can admit content to the verified library; namespace-profile-unsorry-prepare, namespace-profile-unsorry-audit, and namespace-profile-unsorry-replay split the trusted verifier work by job role so the operator can scale prepare/build, axiom audit, and kernel replay capacity independently,

and neglected a single shared runner pool (rejected because noisy agent and fork work can starve merge-blocking verification), GitHub-hosted runners as the only verifier surface or a direct replacement for the role-specific Namespace verifier lanes (rejected because the trusted Lean lane needs stronger profile control, cache-volume control, and visibility; GitHub-hosted concurrency is useful enough to pilot, but not enough to make the protected verifier lane opaque by default), namespace runners for every cheap job (rejected because that wastes paid verifier capacity), contributor self-hosted runners as a merge-blocking verifier (rejected by ADR-049), and scaling first by making all runners larger (rejected because capacity isolation and queue policy are the first-order problem),

to achieve a CI architecture where cheap checks stay cheap, trusted verification stays protected, fork support can be opened without granting unbounded access to paid verifier minutes, and operator-visible namespace capacity is reserved for the work that actually carries soundness,

accepting that GitHub-hosted runner concurrency and queue visibility are plan-dependent, that namespace capacity costs more per trusted minute, that some comments/specs must be kept in sync with operator-side profile sizing, and that future fork automation must add identity/quota controls before it can freely spend namespace verifier capacity.

Runner Classes

Class	Runner surface	Trust / cost role	Examples
Cheap intake	GitHub-hosted `ubuntu-latest`	Low-cost checks before verifier spend	path filters, PR labels, ADR/spec lint, protocol checks
Required aggregator	GitHub-hosted `ubuntu-latest`	Stable required context wrapper	final `gate-a`, `gate-b` aggregation
Prepare verifier	`namespace-profile-unsorry-prepare`	Trusted build/cache preparation	goal builds, statement bindings, archive package validation
Audit verifier	`namespace-profile-unsorry-audit`	Trusted axiom-audit verification	serial `axiom_audit` over changed scope
Replay verifier	`namespace-profile-unsorry-replay`	Trusted kernel replay	incremental replay and forced full replay
Scheduled backstop	namespace profile selected by verifier policy	Defense-in-depth verification	daily full replay with small replay chunk
Generated artifacts	GitHub-hosted unless verifier evidence is required	Interruptible maintenance	leaderboard, targets board, visualization refresh
Agent exploration	contributor/local or separate agent pool	Noisy advisory work	local proving, retries, candidate generation

Capacity Rules

Cheap GitHub-hosted checks should run before namespace verifier jobs wherever possible.
Fork PRs and unknown agents should not directly spend namespace verifier minutes without intake checks, maintainer approval, or future ADR-054 quota policy.
Required merge-blocking verifier jobs must have their own namespace lane and must not be starved by generated artifacts or agent exploration.
A workflow that can admit content to UnsorryLibrary must use the trusted verifier lane defined by ADR-049.
Gate A jobs should target their role-specific profiles: prepare/archive on namespace-profile-unsorry-prepare, audit on namespace-profile-unsorry-audit, and replay on namespace-profile-unsorry-replay.
Superseded runs should be cancelled by concurrency groups so stale commits do not occupy trusted verifier capacity.
Runner sizing is an operator-controlled capacity property; the repository records the current intended size and routing contract, but correctness must not depend on a hidden profile size.
Switching routine Gate A from the Namespace verifier lanes to GitHub-hosted runners requires a shadow benchmark or pilot PR first. The pilot must compare wall time, cache restore behavior, failure modes, queue wait, and verifier log quality against the namespace lane before any required check is moved.

Live Operations Transition

This decision is not allowed to assume an empty queue. The repository already has open proof PRs, queued GitHub-hosted checks, queued namespace Gate A jobs, and in-progress verifier runs. Runner segmentation must therefore be adopted as a live operations change.

The transition rules are:

Do not rename required contexts. gate-a and gate-b remain the branch protection contexts during the transition.
Do not rename namespace profiles during the transition. namespace-profile-unsorry-prepare, namespace-profile-unsorry-audit, and namespace-profile-unsorry-replay remain stable routing labels.
Do not cancel all existing PR runs as part of the cutover. Superseded runs may be cancelled by existing concurrency groups, but active PRs should either finish on their current workflow revision or be explicitly rebased/rerun.
Existing green PRs may merge under the previous workflow revision if they already passed the protected checks. The verifier contract did not weaken.
Existing queued or failed PRs may be re-run under their current revision, or rebased onto the new routing docs/policy when a maintainer wants the new metadata and comments to apply.
Keep both namespace profiles available until the open PR queue has drained through at least one normal Gate A cycle and one scheduled backstop has completed.
If queue depth is already high, pause new autonomous proof PR creation before changing runner capacity.
Record the cutover as an operator event: timestamp, open PR count, queued Gate A count, namespace profile sizes, and any manual cancellations.

The intended migration is additive: the PR documents and clarifies the routing contract without changing required-check names or the soundness boundary. Future workflow rewrites that materially change routing must carry their own live operations plan.

Consequences

Positive. GitHub-hosted runners absorb cheap PR and docs traffic without spending namespace verifier minutes.
Positive. Namespace runners remain reserved for trusted verifier work, where cache volumes, profile sizing, and log visibility matter.
Positive. Fork support has a clear governor: fork work can be cheap until it earns or receives access to the trusted verifier lane.
Positive. The existing Gate A split becomes an explicit platform contract instead of an implicit workflow detail.
Negative. Operator profile changes must be reflected in docs/specs or the repository will mislead maintainers.
Negative. GitHub-hosted runner concurrency is useful but plan-dependent, so queue guarantees cannot rely on an undocumented fixed number.
Negative. More lanes mean more policy surface: labels, path filters, concurrency groups, and quotas must remain understandable.

References

Reference ID	Title	Type	Location
REF-1	Runner pool segmentation spec	Specification	specs/SPEC-058-A-Runner-Pool-Segmentation-And-Verification-Capacity.md
REF-2	Decentralised CI Runner Architecture	Decision	ADR-049-Decentralised-CI-Runner-Architecture.md
REF-3	Verify-on-Ingest	Decision	ADR-048-Verify-On-Ingest.md
REF-4	Gate A Workflow	Specification	specs/SPEC-006-B-Gate-A-Workflow.md
REF-5	Volunteer-Scale Claim Substrate	Decision	ADR-053-Volunteer-Scale-Claim-Substrate.md
REF-6	Agent Identity, Quotas, and Reputation	Decision	ADR-054-Agent-Identity-Quotas-And-Reputation.md
REF-7	Non-contributor proof submission via forks	Issue	GitHub issue #1206

Status History

Status	Approver	Date
Proposed	unsorry maintainers	2026-06-16