Drafted via superpowers:feature-spec. Plan 6 of 7 in the skill
migration. Surface frozen at 4 cost-routable skills (code_review,
debug, retrospective, trainer); LiteLLM proxies model choice; pass-
rate drives the route decision with default-to-local plus an env
kill switch for the empty-data window. Plan 7 amendment baked in:
internal/skills/{review,debug,retrospective,trainer} survive Plan 6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
22 KiB
Spec: Mode 2 routing pod
Plan 6 of 7 — Hyperguild Skill Migration. Loaded after
feature-specskill.
Problem Statement
Mode 2 (client-local) is the cost-and-sovereignty mode for paid client work — keep skill calls inside Tailscale, save tokens, but stay reliable. Plans 1–5 produced everything Mode 2 needs except the consumer: the brain MCP at :30330 is live, four skills are instrumented to log pass | fail | skip, and GET /pass-rate?skill=X&window=Y returns honest numbers (or null when there is no data). What is still missing is the policy layer that reads pass-rate and acts on it.
The supervisor pod (:30320) historically hosted full skill workers (tdd_red/green/refactor, code_review, debug, spec, retrospective, trainer, tier) but with no routing — every call ran local regardless of skill quality, and Claude Code in client-local mode silently lost access to Claude-quality work even when local was wrong. That's the regression Plan 6 fixes.
Why now: the supervisor pod is scheduled for retirement (Plan 7) and the data plumbing for routing decisions exists but has no consumer. Without Plan 6, Plan 7 cannot land.
Success Criteria
- A new pod
routingis deployed via Flux at NodePort:30310, alongside (not replacing) the supervisor and ingestion pods. Image built by gitea CI, deployment manifest underinfra/k3s/apps/routing/.kubectl -n routing get deploymentshows1/1 Ready. POST http://koala:30310/mcpresponds totools/listwith exactly four tools:code_review,debug,retrospective,trainer. Each tool's name + JSON schema is byte-identical to the supervisor's current advertisement (verified by snapshot test).- Bearer-token auth via env var
ROUTING_MCP_TOKEN(same opt-in pattern asSUPERVISOR_MCP_TOKENshipped inf49850d). Empty token = no auth; populated token =Authorization: Bearer <token>required, otherwise HTTP 401 + JSON-RPC-32001. - On every tool call, the pod queries
${BRAIN_URL}/pass-rate?skill=<tool>&window=7dand applies a configurable policy:pass_rate == null→ route to local (default-to-local)pass_rate ≥ HYPERGUILD_ROUTE_LOCAL_FLOOR(default0.90) → route to localHYPERGUILD_ROUTE_LOCAL_CEIL ≤ pass_rate < FLOOR(CEIL default0.70) → 50/50 deterministic sample (hash of canonical request body)pass_rate < CEIL→ route to Claude
- Both routes resolve to a LiteLLM call: local route uses
HYPERGUILD_LOCAL_MODEL(defaultqwen35), Claude route usesHYPERGUILD_CLAUDE_MODEL(defaultclaude-sonnet-4-6). LiteLLM at${LITELLM_BASE_URL}(defaulthttp://piguard:4000) handles provider routing. The routing pod has no direct Anthropic SDK. - Every routing decision is logged via
session_logto the brain pod with{skill: "_routing", phase: "decide", final_status: "skip", message: "<tool>: <decision>", duration_ms, project_root}.final_status: "skip"keeps these entries out of any skill's pass-rate aggregation. - LiteLLM unreachable → fail open to a Claude decision and log
final_status: "fail"for_routing. The pod must still serve requests even if LiteLLM is down for hours. cmd/hyperguild/mode.goupdated:mode client-localwrites the routing entry with"headers": {"X-Hyperguild-Mode": "client-local"}and the_routing_pendingplaceholder field is removed. The pod accepts but does not branch on the header (forward-compat only).task check(lint + test + vet + drift + govulncheck) passes on each task and on the merged branch. The CI gate that bit Plan 1 must not bite Plan 6 (perfeedback_per_task_verificationmemory).- A new
task smoke:routingtarget boots the binary against the live LiteLLM atpiguard:4000and the live/pass-rateatkoala:30330, calls each of the four advertised tools once, and verifies a_routingentry appears in the brain viaGET /pass-rate?skill=_routing&window=1h. This is the live-contract test (per2026-05-03-fake-tests-vs-real-contractbrain entry); fake-server unit tests verify policy logic, the smoke step verifies the contract. - Mode 1 (
cloud) and Mode 3 (sovereign) are byte-identically unchanged. Verified bygit diffshowing no changes tomode.go'smodeCloudormodeSovereignfunctions.
Constraints
- Stdlib + existing deps only. The routing pod reuses
internal/exec/litellm.go(NewLiteLLM,Complete),internal/registry, andinternal/skills/{review,debug,retrospective,trainer}/. No new third-party dependency. Auth code may be duplicated frominternal/mcp/server.goor extracted to a shared helper — implementer's call. - No new persistence. Pass-rate data lives in the brain pod's session JSONL files (Plan 5). Routing-decision logs land in the same place via
session_log. Routing pod has no DB, no cache, no on-disk state beyond an optional in-memory pass-rate cache (TTL = 60 seconds — protects the brain from per-call hammering during an active session). - MCP wire format identical to supervisor's. Tools have the same names and JSON schemas as today. A consumer switches modes by changing only the URL in
.mcp.json— no schema-level differences. Snapshot tests pin this. - Pod must start and serve degraded. If LiteLLM is down at startup, the pod still binds to
:3210, advertises tools, and serves requests with the fail-open-to-Claude behavior described in success criteria. internal/skills/{review,debug,retrospective,trainer}/survives Plan 6. Plan 7's note about deleting them is amended: those four packages are reused by the routing pod and must NOT be deleted in Plan 7. Plan 7 deletes onlyinternal/skills/{tdd,spec}/, the supervisor binary, the supervisor manifests, and frees NodePort:30320. This spec calls out the change so Plan 7's author doesn't delete needed code (per2026-05-03-implicit-cleanup-third-categorybrain entry).- No retries beyond fail-open. A LiteLLM call that errors becomes a Claude decision and a
final_status: "fail"log. No exponential backoff, no circuit breaker — that's policy for a future plan once the failure shape is observed. - Determinism in sampling. When pass-rate is in the sample band (
CEIL ≤ pr < FLOOR), the local-vs-Claude choice for a given request is reproducible: hash a canonical JSON of the request body, low bit picks local. Same input → same decision. Avoids per-call variance confusing the operator during a debugging session.
Out of Scope
- Plan 7 (supervisor retirement). Separate plan, executed after Plan 6 stabilizes. Plan 6 leaves the supervisor pod running; nothing about supervisor changes in this plan.
- Routing for
tdd_red/green/refactor,spec,tier. Perproject_per_skill_routing.md, these are SKILL.md or CLI, not routing-pod tools. They never appear in the routing pod'stools/list. If a future plan changes that decision, it adds them then. - Routing for
brain_ingest. Already routed at the brain pod (Plan 1). No change. - Per-mode policy branching. The pod accepts
X-Hyperguild-Modefor forward-compat but treats absent or unknown values asclient-local. No code path differs on the header value yet. - OAuth, IP allowlisting, rate limiting, audit logging. Bearer-token only; same risk model as the supervisor MCP after
f49850d. - Decision-log read endpoints. Routing decisions land in the brain via
session_log. Reads happen via the existingGET /pass-rateendpoint and JSONL inspection. No new read API. - Materialized routing-decision aggregates. Out of scope for the same reason Plan 5 deferred materialized counters: on-demand scans are fast enough at current data volumes.
- Tunable per-skill thresholds.
FLOORandCEILare global. If the operator decidesdebugneeds a different floor thancode_review, that's a follow-up plan with real data behind the choice. - Sampling beyond a 50/50 hash split. No epsilon-decay schedules, no Thompson sampling, no per-skill exploration policies. Add when data justifies.
- Migration of any existing supervisor-skill
.mcp.jsonregistrations. Consumers update their.mcp.json(viahyperguild mode client-local) when they want Mode 2 behavior. No silent redirect. - Routing-pod-side prompt customization. The four skill packages already own their prompts; the routing pod just calls into them via the existing
Skillinterface. Prompt edits remain a SKILL.md orinternal/skills/<x>/concern.
Technical Approach
A. Binary layout: cmd/routing/
A new Go binary at cmd/routing/main.go. Stdlib + internal/*. Wires:
- Config from env (typed struct in
internal/config/routing.go— peer toConfigfor the supervisor; deliberately a separate type because the surfaces are different and merging would force every routing-pod field onto the supervisor and vice versa). internal/exec/litellm.NewLiteLLM(...)— same client the supervisor uses.internal/skills/{review,debug,retrospective,trainer}.New(...)constructors, each receiving aCompleteFuncthat wraps the routing decision (see C below).internal/registry.New()populated with the four skills.internal/mcp.NewServer(reg, cfg.MCPAuthToken)— reuse the existing handler with bearer auth fromf49850d. The handler is generic; nothing in it is supervisor-specific.
Rationale: the supervisor's runtime is already 80% of what the routing pod needs. Reusing it saves the routing pod from re-implementing skill dispatch, MCP protocol handling, and bearer auth. The only new code is the routing decision itself (C below) and the deployment manifests (G).
B. Configuration via env
Typed struct, parsed at startup. New env vars introduced by Plan 6:
| Env var | Default | Purpose |
|---|---|---|
ROUTING_PORT |
3210 |
Pod's HTTP port (NodePort :30310 maps to this) |
ROUTING_MCP_TOKEN |
— | Bearer token, opt-in (empty = no auth) |
LITELLM_BASE_URL |
http://piguard:4000 |
LiteLLM proxy (reused) |
LITELLM_API_KEY |
— | Reused, sourced from routing-secrets Secret |
BRAIN_URL |
http://ingestion.supervisor:3300 |
In-cluster brain pod for /pass-rate and session_log |
HYPERGUILD_LOCAL_MODEL |
qwen35 |
Model name passed to LiteLLM for the local decision |
HYPERGUILD_CLAUDE_MODEL |
claude-sonnet-4-6 |
Model name for the Claude decision |
HYPERGUILD_ROUTE_LOCAL_FLOOR |
0.90 |
At/above this pass-rate, always local |
HYPERGUILD_ROUTE_LOCAL_CEIL |
0.70 |
Below this, always Claude. Between CEIL and FLOOR is the sample band. |
HYPERGUILD_PASS_RATE_TTL_SECONDS |
60 |
Per-skill in-memory cache TTL |
Rationale: every value an operator might want to tune is an env var, not a hardcoded constant. Defaults are the recommendations from the kickoff and the per-skill-routing memory; sensible cluster values flow in via the Flux-managed Secret. No config file to manage.
C. Decision policy (internal/routing/policy.go)
Pure function, no I/O:
type Decision int
const (
DecideLocal Decision = iota
DecideClaude
)
type Policy struct{ Floor, Ceil float64 }
// Decide returns the routing decision. passRate may be nil when the brain has no data.
// requestHash is a deterministic 64-bit hash of the canonical request body — used only
// when passRate is in the sample band; same hash → same decision.
func (p Policy) Decide(passRate *float64, requestHash uint64) Decision { ... }
Rules (in order):
passRate == nil→DecideLocal(default-to-local)*passRate >= p.Floor→DecideLocal*passRate < p.Ceil→DecideClaude- Otherwise (sample band) →
requestHash & 1picks local on0, claude on1
Rationale: no I/O in the policy means the function is trivially testable (table-driven, no fixtures, no servers). Network calls happen in a wrapping layer that calls Decide — same separation as internal/skills/*/skill.go keeps prompt strings separate from Complete calls. Default-to-local rule is justified in project_per_skill_routing.md: the four advertised skills are exactly the skills marked "MCP→local" in that target architecture.
D. Pass-rate fetcher (internal/routing/passrate.go)
type Fetcher struct {
BaseURL string
HTTPClient *http.Client // 1s timeout
Cache *ttlCache // map[string]*float64 with 60s TTL, struct-internal
}
func (f *Fetcher) Get(ctx context.Context, skill string) (*float64, error)
Calls GET ${BaseURL}/pass-rate?skill=<skill>&window=7d. On success, caches the parsed pass_rate (which may be null) for HYPERGUILD_PASS_RATE_TTL_SECONDS. On error, returns (nil, err); the dispatch wrapper treats this as *passRate == nil and routes to local (the default-to-local fallback also covers brain-pod-down).
Rationale: GET is correct REST per 2026-05-03-rest-semantics-vs-precedent (this is a pure read with query params; it shouldn't follow the legacy POST-everywhere precedent). Cache TTL of 60s prevents per-call hammering during a tight Claude Code loop while staying fresh enough that a flapping pass-rate visibly affects routing within a minute. No persistence — restart loses cache, that's fine.
E. Dispatch wrapper
The four skills are constructed with their existing CompleteFunc signature ((ctx, model, system, user) (string, int64, error)). The routing pod wraps it:
func (r *Router) Complete(ctx context.Context, skill, model, system, user string) (string, int64, error) {
pr, _ := r.fetcher.Get(ctx, skill)
decision := r.policy.Decide(pr, hashCanonical(system, user))
chosenModel := r.cfg.ClaudeModel
if decision == DecideLocal {
chosenModel = r.cfg.LocalModel
}
out, ms, err := r.litellm.Complete(ctx, chosenModel, system, user)
r.logDecision(skill, decision, err, ms)
if err != nil {
// fail open: try Claude once if we routed local; if Claude also fails, return error.
if decision == DecideLocal {
chosenModel = r.cfg.ClaudeModel
out, ms, err = r.litellm.Complete(ctx, chosenModel, system, user)
r.logDecision(skill, DecideClaude, err, ms) // second log entry, marked fail if still erroring
}
return out, ms, err
}
return out, ms, nil
}
The skill packages don't know about routing — they receive a CompleteFunc and call it. The wrapper substitutes routing logic at construction time.
Rationale: keeps the skill packages oblivious to mode. Same internal/skills/review/ works under the supervisor (no routing) and under the routing pod (routed) without any conditional logic in the skill itself. Plan 7's deletion of the supervisor leaves the skills' shape intact for the routing pod.
F. Decision logging (internal/routing/log.go)
After every decision, POST a session log entry to ${BRAIN_URL}/write (the brain pod's existing endpoint, which appends to brain/sessions/<session>.jsonl). Entry shape:
{
"skill": "_routing",
"phase": "decide",
"final_status": "skip",
"message": "<original_skill>: <decision> (pass_rate=<value or 'null'>, model=<chosen>)",
"duration_ms": <litellm_round_trip>,
"project_root": "<path from request, or 'unknown'>",
"timestamp": "<RFC3339>",
"session_id": "<from request, or generated>"
}
final_status: "skip" keeps these entries out of any real skill's pass-rate aggregation (Plan 5's aggregator counts only pass/fail). Operators can still query GET /pass-rate?skill=_routing&window=7d for routing-failure visibility (when LiteLLM down → final_status: "fail" in the second log entry).
Rationale: closes the observability loop without adding a new endpoint or schema. _routing namespaces routing entries away from skill names. skip is the only honest classification — routing isn't itself a pass/fail event in the skill sense.
G. Deployment
New manifest directory infra/k3s/apps/routing/ mirroring infra/k3s/apps/supervisor/'s shape:
namespace.yaml— namespacerouting(peer tosupervisor)deployment.yaml— single replica, nodeSelector koala, image from gitea registry,envFrom: secretRef: routing-secretsservice.yaml— ClusterIP on port 3210nodeport.yaml— NodePort 30310 → service 3210secrets.enc.yaml— SOPS-encrypted, containsLITELLM_API_KEYand (optionally)ROUTING_MCP_TOKENkustomization.yaml— bundles the above
The supervisor pod's CI image build pattern (gitea Actions → gitea.d-ma.be/mathias/supervisor:<sha>) is replicated for gitea.d-ma.be/mathias/routing:<sha>. Flux's existing image-automation will bump the manifest's image tag on each push.
Rationale: copying the supervisor pod's manifest shape (rather than designing from scratch) is the YAGNI move. Flux + image automation already proven on supervisor; same pattern, same operator mental model. Mode 2 setup is now a Flux change, not a one-off kubectl ritual.
H. Live smoke test
task smoke:routing (in the project Taskfile) does:
- Boot the binary locally with
LITELLM_BASE_URL=http://piguard:4000andBRAIN_URL=http://koala:30330. Bind to a random localhost port (so it doesn't conflict with anything else). - Send
tools/listand assert four tool names. - For each tool, send a minimal valid
tools/call. Don't assert on response content — assert response shape (no error, has content). - After all four calls, query
GET http://koala:30330/pass-rate?skill=_routing&window=1hand asserttotal >= 4. - Tear down.
Skipped automatically when LiteLLM is unreachable or when run outside Tailscale (tier 3) — emits a SKIP line and exits 0. task check does NOT include task smoke:routing (CI runner doesn't have Tailscale); operator runs it manually before bumping production.
Rationale: unit tests with httptest.Server fakes verify the policy and the dispatch wrapper logic. The smoke test is the only thing that will catch a contract drift between the routing pod's Complete calls and the actual LiteLLM API, or a schema drift between /pass-rate and what the fetcher expects (per 2026-05-03-fake-tests-vs-real-contract).
I. Mode-template update (cmd/hyperguild/mode.go)
modeClientLocal is amended:
- The
routingentry'surlstays athttp://koala:30310/mcp. - A new key
headersis added with{"X-Hyperguild-Mode": "client-local"}. - The placeholder
_routing_pendingfield is removed, since the routing pod now exists.
Tests in cmd/hyperguild/mode_test.go are updated to assert the new structure. README in cmd/hyperguild/README.md updated to drop the "not deployed yet" note.
Rationale: Plan 4 deliberately scaffolded the placeholder for Plan 6 to fill in. This is the fill-in. Removing _routing_pending is the implicit cleanup the kickoff anticipates — making it explicit in the spec avoids a Plan-completeness gap (per 2026-05-03-implicit-cleanup-third-category).
Risks
-
Empty pass-rate window in the first weeks. Plans 3–5 merged on 2026-05-03; usage data has not accumulated. With default-to-local active for all four routed skills, the first weeks of Mode 2 = "everything goes local." If local quality is rough on
code_reviewordebug, the operator's first impression of Mode 2 is bad, and confidence in Plan 6 erodes before data lands. Mitigation: the FLOOR / CEIL are env-tunable. If local quality is unworkable in the first week, setHYPERGUILD_ROUTE_LOCAL_FLOOR=2.0(impossible threshold) and the pod becomes default-to-Claude with no code change. This is a deliberate kill switch for the early window. -
LiteLLM-as-single-dependency. The routing pod has exactly one upstream LLM provider:
piguard:4000. If LiteLLM is misconfigured (wrong model name routed to wrong provider, expired Anthropic key in LiteLLM's config), every routing-pod call returns garbage. Mitigation: the smoke test catches gross misconfig before deploy; once deployed, LiteLLM's own/healthendpoint is the canary (the pod doesn't probe it — operator monitors LiteLLM separately). If a deeper failure mode emerges, add a routing-pod liveness probe in a follow-up. -
Skill-schema drift. The routing pod's
tools/listis asserted byte-identical to the supervisor's via snapshot test. If someone evolves the supervisor's schemas between Plan 6 merge and Plan 7 (a long window), the snapshot drifts. Mitigation: the spec documents that Plan 6 freezes the schemas; supervisor edits to skill schemas are out of scope until Plan 7 deletes the supervisor. This is a soft constraint enforced by the spec, not by code. If the supervisor genuinely needs a schema change before Plan 7, that's a separate plan. -
Flux drift on
kubectl rollout restart. Demonstrated during the bearer-auth rollout earlier today: Flux server-side-applies the deployment every 30s and strips thekubectl.kubernetes.io/restartedAtannotation, which deletes the new ReplicaSet's pod. Mitigation: the Plan 6 implementer prompt and the README note thatkubectl delete pod -l app=routingis the correct way to force a restart on Flux-managed deployments — the existing ReplicaSet recreates without an annotation Flux can revert. (This finding is worth a brain entry; capture in retrospective.) -
Mode header not forwarded by Claude Code. Plan 6 assumes Claude Code propagates
headersfrom.mcp.json. The bearer-auth rollout proved this works forAuthorization. The same path should work forX-Hyperguild-Mode. Mitigation: the pod treats absent header asclient-local(the only mode that registers the pod). If forwarding silently breaks, behavior is identical — header is forward-compat only. -
Sample-band hash collision producing skewed routing. Hash inputs are
(system, user)strings. If skill prompts produce highly similar bodies (debug bug A vs debug bug B with similar wording), low-bit hash distribution might cluster on one side. Mitigation: at the volumes Plan 6 expects (single operator, ~10s of routed calls/hour at peak), bias is statistically invisible. If volume ever rises, swaphash & 1for a stronger split. Not the first failure mode worth pre-engineering.
Cross-references
- Spec for Plan 5 (consumer of
/pass-rate):docs/superpowers/specs/2026-05-03-pass-rate-logging-design.md - Spec for Plan 4 (which scaffolded the
:30310placeholder):docs/superpowers/specs/2026-05-03-hyperguild-cli-design.md - Auto-memory entries
project_three_modes,project_skill_migration_plans,project_per_skill_routing,feedback_per_task_verification,feedback_sudo - Brain entries
2026-05-03-rest-semantics-vs-precedent,2026-05-03-aggregator-normalization-backwards-compat,2026-05-03-fake-tests-vs-real-contract,2026-05-03-implicit-cleanup-third-category,2026-05-03-code-reviewer-output-as-candidates,2026-05-03-done-with-concerns-vs-blocked,2026-05-03-verification-depth-formula,2026-05-03-plan-canonical-dispatch-ephemeral