Mathias 37fdd33b2d
All checks were successful
CI / Lint / Test / Vet (push) Successful in 11s
CI / Mirror to GitHub (push) Has been skipped
feat(ingestion): chunk markdown before embedding (#38)
Long markdown files (>~8KB) silently failed to embed because nomic-embed-text
on iguana has a 2048-token context. embed sync logged errors=1 every cycle
with no useful body until #37 added per-item logging — three files exceed
the ceiling: finbert source (8 KB), koala-machine-state (7.1 KB),
litellm-absorption (8.8 KB). Curated knowledge entries should never be
vector-blind.

Approach: chunk-before-embed, no schema change.

vectorstore/chunk.go (new)
- ChunkMarkdown splits at H1/H2 boundaries; sections over maxBytes are
  further split at paragraph boundaries, packing greedily under budget.
- NumberChunks assigns "<parent>#NNNN" storage paths (1-based, zero-padded
  to 4 digits — handles files with up to ~10k sections in stable sort order).
- ParentPath strips the chunk suffix for retrieval-side dedup.

vectorstore/sync.go
- After ChunkMarkdown produces N pieces, each is embedded + upserted as a
  separate brain_embeddings row at "<parent>#NNNN". maxChunkBytes = 4000
  (≈1000 nomic tokens, well under the 2048 ceiling with headroom for
  unicode/code blocks).
- "Already embedded?" check now reduces known paths to parent set via
  ParentPath, so the first chunk hit short-circuits the file.
- Delete walk also reduces via ParentPath; when a parent file disappears,
  every chunk row (and any pre-existing bare-path row, for backward
  compatibility with rows written before this change) gets dropped.

search/search.go
- hybridMerge collapses chunk-path vector hits to parent via ParentPath
  before scope check, RRF accumulation, and hydration. A file with three
  chunk hits returns one result row, not three.

Backward compatibility: pre-existing bare-path rows in brain_embeddings
keep working — ParentPath returns them unchanged, knownParents handles
them as if they were "wiki/foo.md#NNNN" hits, sync skips re-embed, and
search dedup is a no-op for them. No migration required to ship.

Tests:
- chunk_test.go covers short / heading split / oversized section /
  content preservation / chunk numbering / parent-path stripping.
- sync_test.go adds long-file chunking, single-chunk-row short file,
  skip-if-any-chunk-known, delete-all-chunks-of-disappeared-file.
  Existing tests updated for #NNNN paths.
- search_test.go adds chunk-paths-dedupe-to-parent.

Closes gitea/mathias/infra#38.
2026-05-19 21:57:09 +02:00

hyperguild

An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.

How it works

Your Claude Code session (in any project)
    │
    │  MCP over HTTP (Tailscale)
    ├──▶ supervisor  :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
    ├──▶ routing     :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
    └──▶ brain       :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
                       │
                       └─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
    │
    ▼
brain/
├── sessions/       — JSONL log, one file per session_id
├── wiki/           — searchable knowledge (full-text)
│   ├── concepts/
│   ├── entities/
│   └── sources/
├── raw/            — retrospective output, staged for review
└── training-data/  — SFT/DPO/RL data (Phase 2)

Phase 1 tools (available now)

Tool What it does
tdd_red Writes a failing test for a spec, verifies it fails
tdd_green Writes the minimal implementation to make tests pass
tdd_refactor Cleans up implementation while keeping tests green
session_log Appends a structured entry to the session JSONL log
retrospective Reads the session log, identifies novel learnings, writes to brain/raw/
brain_query Full-text search over brain/wiki/
brain_write Writes a note to brain/raw/ (with optional YAML frontmatter)
tier Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline)

Start the servers

# Requires goreman: go install github.com/mattn/goreman@latest
task start    # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop     # kills both by port

Connect a project

Create .mcp.json in your project root:

{
  "mcpServers": {
    "supervisor": {
      "type": "http",
      "url": "http://koala:30320/mcp"
    },
    "brain": {
      "type": "http",
      "url": "http://koala:30330/mcp"
    }
  }
}

Two MCP servers are exposed today, both reachable over Tailscale:

  • supervisor at koala:30320 — skill workers (tdd_red/green/refactor, review, debug, spec, retrospective, trainer, tier).
  • brain at koala:30330 — knowledge access (brain_query, brain_write, brain_ingest, brain_ingest_raw) and session_log. Hosted by the ingestion service directly, no separate pod.

No local binary or stdio shim is required — Claude Code talks to both via HTTP.

Open Claude Code in your project — run /mcp to confirm both servers are listed.

A typical TDD session

1. Call tdd_red    → spec in, failing test file out
2. Call tdd_green  → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log  → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context

Tier detection

The supervisor probes connectivity at call time:

Tier Label Condition
1 full-online Can reach api.anthropic.com
2 lan-only Can reach LiteLLM but not Anthropic
3 airplane No external connectivity

Key env vars

Variable Default Purpose
INGEST_BRAIN_DIR ../brain Brain directory for ingestion server
INGEST_PORT 3300 Ingestion server port
SUPERVISOR_CONFIG_DIR ./config/supervisor Skill discipline files
SUPERVISOR_SESSIONS_DIR ./brain/sessions JSONL session logs
INGEST_BASE_URL http://localhost:3300 Supervisor → ingestion
LITELLM_BASE_URL LiteLLM proxy for Tier 2 model routing
SUPERVISOR_MCP_TOKEN Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced
ROUTING_PORT 3210 Routing pod's listen port
ROUTING_MCP_TOKEN Optional bearer token for the routing MCP HTTP endpoint
BRAIN_URL http://ingestion.supervisor:3300 Routing pod → brain (in-cluster)
HYPERGUILD_FAST_MODEL koala/qwen35-9b-fast Fast model for high-pass-rate skill calls
HYPERGUILD_THINKING_MODEL iguana/gemma4-26b Thinking model for low-pass-rate skill calls
HYPERGUILD_ROUTE_LOCAL_FLOOR 0.90 At/above pass rate, route to fast model
HYPERGUILD_ROUTE_LOCAL_CEIL 0.70 Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band.
HYPERGUILD_PASS_RATE_TTL_SECONDS 60 Per-skill pass-rate cache TTL

Operator note: LiteLLM at LITELLM_BASE_URL must register both HYPERGUILD_FAST_MODEL and HYPERGUILD_THINKING_MODEL for routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal is final_status: "fail" on _routing entries in the brain.

Phase 2 (planned)

  • review skill — structured code review with iron law enforcement
  • debug skill — hypothesis-driven debugging sessions
  • spec skill — generates specs from conversations
  • trainer — extracts SFT/DPO pairs from session logs for fine-tuning
Description
MCP supervisor for disciplined Claude Code sessions
Readme 3 MiB
Languages
Go 97.3%
Shell 1.8%
Python 0.6%
Dockerfile 0.2%