Wires nomic-embed-text (iguana ollama) + pgvector on the shared
postgres18 into brain_query / brain_answer via Reciprocal Rank Fusion.
Pure BM25 stays the default; setting BRAIN_PG_DSN and BRAIN_EMBED_URL
together opts in. Setting one without the other is misconfiguration →
exit 1.
New packages:
- internal/embed
Client.Embed(ctx, text) → []float32 via POST {URL}/api/embed.
Defaults to nomic-embed-text:latest (768 dim). nil-on-empty-URL so
callers gate on a single nil check.
- internal/vectorstore
PGStore wraps a pgxpool against postgres18. Init creates
brain_embeddings(path PK, vector(768), updated_at) + HNSW cosine
index idempotently. Upsert / Delete / Search / KnownPaths.
Sync(brainDir, store, embedder) diffs brain/wiki/ against the store
and upserts new files / deletes removed ones; StartSync runs it on
a ticker (default 300s). Integration tests gated by BRAIN_PG_TEST_DSN.
- scripts/brain-embeddings-init.sql
One-time DBA setup: brain DB, brain_app role, vector extension,
GRANTs. Idempotent.
Search layer:
- search.QueryOptions gains Vector + Embedder fields.
- QueryContext is the cancellable variant; Query stays for callers.
- When both are set, BM25 (top-N) and pgvector (top-4N) candidates
merge via Reciprocal Rank Fusion (k=60, Cormack et al. 2009 — no
tuning knob, robust to scale differences between rankers).
- Vector-only hits are hydrated from disk so callers see uniform
Result records (path, title, excerpt, wing, hall, score).
- Wing/hall filters still apply to vector candidates via path-prefix.
- On embedder/vector errors the search falls back to BM25 — embedding
outage degrades quality but doesn't take the brain offline.
MCP wiring:
- mcp.Server.WithHybridRetrieval(v, e) opt-in setter, same shape as
WithReranker.
- brainQuery and brainAnswer pass the wired vector/embedder through
to search.QueryContext.
REST:
- POST /backfill-embeddings drives Sync synchronously. Returns
{added, deleted, errors[]}. 503 when feature is unconfigured.
cmd/server/main.go:
- BRAIN_PG_DSN + BRAIN_EMBED_URL together enable hybrid; one alone
→ exit 1.
- vectorAdapter bridges *PGStore (returns []Hit) to
search.VectorSearcher (which takes []VectorHit) without either
package importing the other.
- BRAIN_EMBED_SYNC_INTERVAL (default 300s) controls the background
Sync ticker.
Backend pivot from Qdrant to pgvector recorded in DECISIONS.md
2026-05-18 (supersedes 2026-04-08): postgres18 already runs in
databases/ ns, Qdrant was never deployed, one engine beats two.
Dependency: github.com/jackc/pgx/v5 — modern, native pgvector via
parametric vector literals.
Tests:
- embed.Client: empty-URL nil, request shape, dimension, upstream
error propagation, empty-text rejection.
- vectorstore.PGStore: dimension validation (unit); upsert/search/
KnownPaths (integration, BRAIN_PG_TEST_DSN-gated).
- vectorstore.Sync: adds new files, skips known, deletes
disappeared, skips _index.md, no-op when nil, collects embedder
errors.
- search.Query: hybrid promotes vector-only hits via RRF; falls
back to BM25 on embedder error.
Closes hyperguild#8.
hyperguild
An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.
How it works
Your Claude Code session (in any project)
│
│ MCP over HTTP (Tailscale)
├──▶ supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
├──▶ routing :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
└──▶ brain :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
│
└─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
│
▼
brain/
├── sessions/ — JSONL log, one file per session_id
├── wiki/ — searchable knowledge (full-text)
│ ├── concepts/
│ ├── entities/
│ └── sources/
├── raw/ — retrospective output, staged for review
└── training-data/ — SFT/DPO/RL data (Phase 2)
Phase 1 tools (available now)
| Tool | What it does |
|---|---|
tdd_red |
Writes a failing test for a spec, verifies it fails |
tdd_green |
Writes the minimal implementation to make tests pass |
tdd_refactor |
Cleans up implementation while keeping tests green |
session_log |
Appends a structured entry to the session JSONL log |
retrospective |
Reads the session log, identifies novel learnings, writes to brain/raw/ |
brain_query |
Full-text search over brain/wiki/ |
brain_write |
Writes a note to brain/raw/ (with optional YAML frontmatter) |
tier |
Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline) |
Start the servers
# Requires goreman: go install github.com/mattn/goreman@latest
task start # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop # kills both by port
Connect a project
Create .mcp.json in your project root:
{
"mcpServers": {
"supervisor": {
"type": "http",
"url": "http://koala:30320/mcp"
},
"brain": {
"type": "http",
"url": "http://koala:30330/mcp"
}
}
}
Two MCP servers are exposed today, both reachable over Tailscale:
supervisoratkoala:30320— skill workers (tdd_red/green/refactor,review,debug,spec,retrospective,trainer,tier).brainatkoala:30330— knowledge access (brain_query,brain_write,brain_ingest,brain_ingest_raw) andsession_log. Hosted by the ingestion service directly, no separate pod.
No local binary or stdio shim is required — Claude Code talks to both via HTTP.
Open Claude Code in your project — run /mcp to confirm both servers are listed.
A typical TDD session
1. Call tdd_red → spec in, failing test file out
2. Call tdd_green → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context
Tier detection
The supervisor probes connectivity at call time:
| Tier | Label | Condition |
|---|---|---|
| 1 | full-online | Can reach api.anthropic.com |
| 2 | lan-only | Can reach LiteLLM but not Anthropic |
| 3 | airplane | No external connectivity |
Key env vars
| Variable | Default | Purpose |
|---|---|---|
INGEST_BRAIN_DIR |
../brain |
Brain directory for ingestion server |
INGEST_PORT |
3300 |
Ingestion server port |
SUPERVISOR_CONFIG_DIR |
./config/supervisor |
Skill discipline files |
SUPERVISOR_SESSIONS_DIR |
./brain/sessions |
JSONL session logs |
INGEST_BASE_URL |
http://localhost:3300 |
Supervisor → ingestion |
LITELLM_BASE_URL |
— | LiteLLM proxy for Tier 2 model routing |
SUPERVISOR_MCP_TOKEN |
— | Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced |
ROUTING_PORT |
3210 |
Routing pod's listen port |
ROUTING_MCP_TOKEN |
— | Optional bearer token for the routing MCP HTTP endpoint |
BRAIN_URL |
http://ingestion.supervisor:3300 |
Routing pod → brain (in-cluster) |
HYPERGUILD_FAST_MODEL |
koala/qwen35-9b-fast |
Fast model for high-pass-rate skill calls |
HYPERGUILD_THINKING_MODEL |
iguana/gemma4-26b |
Thinking model for low-pass-rate skill calls |
HYPERGUILD_ROUTE_LOCAL_FLOOR |
0.90 |
At/above pass rate, route to fast model |
HYPERGUILD_ROUTE_LOCAL_CEIL |
0.70 |
Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band. |
HYPERGUILD_PASS_RATE_TTL_SECONDS |
60 |
Per-skill pass-rate cache TTL |
Operator note: LiteLLM at
LITELLM_BASE_URLmust register bothHYPERGUILD_FAST_MODELandHYPERGUILD_THINKING_MODELfor routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal isfinal_status: "fail"on_routingentries in the brain.
Phase 2 (planned)
reviewskill — structured code review with iron law enforcementdebugskill — hypothesis-driven debugging sessionsspecskill — generates specs from conversationstrainer— extracts SFT/DPO pairs from session logs for fine-tuning