diff --git a/CLAUDE.md b/CLAUDE.md index 8e1b4e5..a0ad2b7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -47,31 +47,28 @@ ## MCP endpoints -Two MCP servers expose this project's tooling, both reachable over Tailscale: +Two MCP servers are live, both reachable over Tailscale and via HTTPS domain: -- **`brain`** at `http://koala:30330/mcp` — preferred path for `brain_query`, - `brain_write`, `brain_ingest`, `brain_ingest_raw`, and `session_log`. Hosted - by the ingestion service directly. -- **`supervisor`** at `http://koala:30320/mcp` — skill workers (`tdd_red`, - `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, - `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later - migration. +- **`brain`** at `https://brain-mcp.d-ma.be/mcp` (NodePort `koala:30330`) — + `brain_query`, `brain_write`, `brain_ingest`, `brain_ingest_raw`, + `brain_answer`, `brain_classify`, `session_log`. Hosted by the ingestion + service. Auth: Dex JWT (claude.ai OAuth) or static `BRAIN_MCP_TOKEN`. - **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises - the same four cost-routable skills as the supervisor (`review`, `debug`, - `retrospective`, `trainer`) but per-call decides whether to use a local - model or Claude based on the brain's `/pass-rate` response. Bearer auth - via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this - endpoint; Mode 1 and Mode 3 do not. + `review`, `debug`, `retrospective`, `trainer`; per-call routes to local model + or Claude based on brain `/pass-rate`. Bearer auth via `ROUTING_MCP_TOKEN` + (opt-in). Only `mode client-local` registers this endpoint. + +The supervisor MCP (`koala:30320`) was retired in Plan 7 (2026-05-12). Its +skill workers (`tdd`, `spec`) are now SKILL.md files; routed skills moved to +the routing pod; brain tools moved to the brain MCP. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, -`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for -shell scripts and non-MCP clients. +`/ingest-path`, `/backfill-refs`, `/pass-rate`) remains available on port 3300 +for shell scripts and non-MCP clients. -The brain HTTP REST API also serves a read-only `GET /pass-rate?skill=X&window=Y` -endpoint that aggregates `final_status` counts from session logs and returns -`{skill, window, pass, fail, skip, total, pass_rate}`. Plan 6 (routing pod) -reads this to decide whether to route skill calls to local models. Pass rate -is `null` when no logged invocations are in the window. +`brain_answer(query)` performs BM25 retrieval + LLM synthesis (berget.ai +gemma4:31b → iguana fallback). `brain_classify(text)` infers doc type, title, +and tags. Both require `BRAIN_LLM_PRIMARY_URL` to be set in the ingestion pod. ## Agent instructions diff --git a/DECISIONS.md b/DECISIONS.md index 1f731b7..f939898 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -72,23 +72,42 @@ Record *why* things are the way they are. Future-you will thank present-you. Plan 6 (Mode 2 routing pod, 2026-05-04) introduces a second consumer of the four cost-routable skill packages. The routing pod constructs each skill via `.New(Config{...})` and hands it `routing.Router.Run` as -the `CompleteFunc`. Plan 7 (supervisor retirement) MUST NOT delete the -four packages. +the `CompleteFunc`. -**Plan 7's allowed deletions:** -- `internal/skills/{tdd,spec,tier}/` (not consumed by the routing pod) -- `cmd/supervisor/` (binary) -- `Dockerfile` (supervisor's, at repo root — distinct from `Dockerfile.routing`) -- supervisor manifests in the infra repo -- NodePort `:30320` - -**Plan 7's preserved code:** +**Preserved code (do not delete):** - `internal/skills/{review,debug,retrospective,trainer}/` -- `internal/registry` -- `internal/mcp` -- `internal/exec/litellm.go` -- `internal/routing/` (entirely new in Plan 6) -- `cmd/routing/` +- `internal/registry`, `internal/mcp`, `internal/exec/litellm.go` +- `internal/routing/`, `cmd/routing/` + +--- + +## Plan 7: supervisor pod retired (2026-05-12) + +**What was deleted:** `cmd/supervisor/`, `internal/skills/{tdd,spec}/`, +root `Dockerfile`, supervisor k8s manifests (Deployment, Service, Ingress, +NodePort 30320), `supervisor` entry removed from all `.mcp.json` configs. + +**Coverage:** `tdd`/`spec` → SKILL.md files in `~/dev/.skills/`; `review`, +`debug`, `retrospective`, `trainer` → routing pod; `brain_*`/`session_log` → +brain MCP; `tier` → `hyperguild tier` CLI. + +--- + +## 2026-05-12 — brain_answer and brain_classify: LLM routing via berget.ai → iguana + +**Context:** Brain MCP returned raw BM25 excerpts with no synthesis. Adding +LLM-backed tools enables Q&A and ingestion enrichment without a separate service. + +**Decision:** Two new MCP tools in the ingestion service (`ingestion/internal/mcp/`): +- `brain_answer(query)` — BM25 top-10 → LLM synthesis → answer + sources +- `brain_classify(text)` — LLM classifies doc into type/title/tags + +Primary LLM: berget.ai `gemma4:31b` (EU cloud, spend tokens while available). +Fallback: iguana `gemma4:31b` (local Ollama). Reranker deferred to follow-up. +Router lives in `ingestion/internal/llm.Router`; opt-in via `BRAIN_LLM_PRIMARY_URL`. + +**Consequences:** Brain becomes a knowledge assistant, not just a search index. +When berget.ai tokens run out, flip `BRAIN_LLM_PRIMARY_URL` to iguana. ---