mathias/hyperguild

Fork 0

Go to file

Mathias 2b7bbe38c7

CI / Lint / Test / Vet (push) Successful in 11s

Details

CI / Mirror to GitHub (push) Successful in 4s

Details

docs(eval): record M4 + M4b scorer runs — phase 2 gate cleared (infra#72)

Tier-weighted retrieval against the qa-2026-05.md 20-question set:

| run                            | top-1 | top-3 |
|--------------------------------|-------|-------|
| baseline (pre-phase-1)         | 20%   | 65%   |
| post phase 1 (parser+content)  | 20%   | 70%   |
| post M4 (tier weighting)       | 30%   | 75%   |
| post M4b (entities → K tier)   | 35%   | 80%   |

Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the
≥10pt close-gate set in infra#72.

Three remaining misses are content-keyword issues, not structure
issues (the questions don't share enough lexical surface with the
target entries to surface via BM25 alone). Vector search would
help here but the iguana embedder is off-mesh (see infra#64).

2026-05-25 18:51:29 +02:00

.context

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.gitea/workflows

fix(cd): drop retired supervisor build, add routing rollout verification

2026-05-18 11:48:57 +02:00

.skills

chore: scaffold supervisor from project template

2026-04-16 21:50:53 +02:00

brain

docs(eval): record M4 + M4b scorer runs — phase 2 gate cleared (infra#72)

2026-05-25 18:51:29 +02:00

cmd

test(routing): de-flake TestRoutingPodEndToEnd

2026-05-18 20:00:18 +02:00

config

fix(config): make no-JSON instruction unmissable in protocols.md

2026-04-22 16:51:51 +02:00

docs

docs(plan6): implementation plan for Mode 2 routing pod

2026-05-04 14:53:03 +02:00

ingestion

fix(search,graph): M4b wiki/entities/ → tier=knowledge

2026-05-25 18:49:37 +02:00

internal

chore(routing): flip LITELLM_BASE_URL default to https://llm-api.d-ma.be

2026-05-24 15:06:23 +02:00

scripts

chore(routing): flip LITELLM_BASE_URL default to https://llm-api.d-ma.be

2026-05-24 15:06:23 +02:00

.aider.conventions.md

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.cursorrules

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.dockerignore

fix: add .dockerignore and non-root USER to Dockerfile

2026-04-20 20:27:42 +02:00

.env.example

feat: wire brain, org, sessionlog, retrospective skills into supervisor

2026-04-17 20:52:16 +02:00

.gitignore

chore: commit adapters; add context freshness gate to task check

2026-04-29 15:59:52 +02:00

.mcp.json

chore(mcp): remove supervisor entry from .mcp.json

2026-05-12 14:49:46 +02:00

.skills-shared

chore: scaffold supervisor from project template

2026-04-16 21:50:53 +02:00

AGENTS.md

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

CLAUDE.md

docs: update CLAUDE.md and DECISIONS.md for completed 7-plan migration

2026-05-12 14:53:08 +02:00

DECISIONS.md

feat(brain-mcp): OAuth 2.0 client_credentials flow for claude.ai

2026-05-18 22:21:54 +02:00

Dockerfile.routing

build(routing): Dockerfile + CD workflow

2026-05-05 07:19:18 +02:00

go.mod

feat(auth): add Dex JWT middleware to supervisor, routing pod, and brain MCP

2026-05-11 20:10:05 +02:00

go.sum

feat(auth): add Dex JWT middleware to supervisor, routing pod, and brain MCP

2026-05-11 20:10:05 +02:00

Procfile

feat(ingestion): wire watcher into server startup + fix Procfile env vars

2026-04-22 23:09:00 +02:00

README.md

refactor(routing): rename local/claude to fast/thinking model pair

2026-05-08 16:39:42 +02:00

Taskfile.yml

test(routing): live-contract smoke target

2026-05-05 22:52:23 +02:00

README.md

hyperguild

An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.

How it works

Your Claude Code session (in any project)
    │
    │  MCP over HTTP (Tailscale)
    ├──▶ supervisor  :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
    ├──▶ routing     :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
    └──▶ brain       :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
                       │
                       └─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
    │
    ▼
brain/
├── sessions/       — JSONL log, one file per session_id
├── wiki/           — searchable knowledge (full-text)
│   ├── concepts/
│   ├── entities/
│   └── sources/
├── raw/            — retrospective output, staged for review
└── training-data/  — SFT/DPO/RL data (Phase 2)

Phase 1 tools (available now)

Tool	What it does
`tdd_red`	Writes a failing test for a spec, verifies it fails
`tdd_green`	Writes the minimal implementation to make tests pass
`tdd_refactor`	Cleans up implementation while keeping tests green
`session_log`	Appends a structured entry to the session JSONL log
`retrospective`	Reads the session log, identifies novel learnings, writes to brain/raw/
`brain_query`	Full-text search over brain/wiki/
`brain_write`	Writes a note to brain/raw/ (with optional YAML frontmatter)
`tier`	Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline)

Start the servers

# Requires goreman: go install github.com/mattn/goreman@latest
task start    # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop     # kills both by port

Connect a project

Create .mcp.json in your project root:

{
  "mcpServers": {
    "supervisor": {
      "type": "http",
      "url": "http://koala:30320/mcp"
    },
    "brain": {
      "type": "http",
      "url": "http://koala:30330/mcp"
    }
  }
}

Two MCP servers are exposed today, both reachable over Tailscale:

supervisor at koala:30320 — skill workers (tdd_red/green/refactor, review, debug, spec, retrospective, trainer, tier).
brain at koala:30330 — knowledge access (brain_query, brain_write, brain_ingest, brain_ingest_raw) and session_log. Hosted by the ingestion service directly, no separate pod.

No local binary or stdio shim is required — Claude Code talks to both via HTTP.

Open Claude Code in your project — run /mcp to confirm both servers are listed.

A typical TDD session

1. Call tdd_red    → spec in, failing test file out
2. Call tdd_green  → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log  → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context

Tier detection

The supervisor probes connectivity at call time:

Tier	Label	Condition
1	full-online	Can reach api.anthropic.com
2	lan-only	Can reach LiteLLM but not Anthropic
3	airplane	No external connectivity

Key env vars

Variable	Default	Purpose
`INGEST_BRAIN_DIR`	`../brain`	Brain directory for ingestion server
`INGEST_PORT`	`3300`	Ingestion server port
`SUPERVISOR_CONFIG_DIR`	`./config/supervisor`	Skill discipline files
`SUPERVISOR_SESSIONS_DIR`	`./brain/sessions`	JSONL session logs
`INGEST_BASE_URL`	`http://localhost:3300`	Supervisor → ingestion
`LITELLM_BASE_URL`	—	LiteLLM proxy for Tier 2 model routing
`SUPERVISOR_MCP_TOKEN`	—	Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced
`ROUTING_PORT`	`3210`	Routing pod's listen port
`ROUTING_MCP_TOKEN`	—	Optional bearer token for the routing MCP HTTP endpoint
`BRAIN_URL`	`http://ingestion.supervisor:3300`	Routing pod → brain (in-cluster)
`HYPERGUILD_FAST_MODEL`	`koala/qwen35-9b-fast`	Fast model for high-pass-rate skill calls
`HYPERGUILD_THINKING_MODEL`	`iguana/gemma4-26b`	Thinking model for low-pass-rate skill calls
`HYPERGUILD_ROUTE_LOCAL_FLOOR`	`0.90`	At/above pass rate, route to fast model
`HYPERGUILD_ROUTE_LOCAL_CEIL`	`0.70`	Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band.
`HYPERGUILD_PASS_RATE_TTL_SECONDS`	`60`	Per-skill pass-rate cache TTL

Operator note: LiteLLM at LITELLM_BASE_URL must register both HYPERGUILD_FAST_MODEL and HYPERGUILD_THINKING_MODEL for routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal is final_status: "fail" on _routing entries in the brain.

Phase 2 (planned)

review skill — structured code review with iron law enforcement
debug skill — hypothesis-driven debugging sessions
spec skill — generates specs from conversations
trainer — extracts SFT/DPO pairs from session logs for fine-tuning