Files

Mathias Bergqvist 848001f27f docs: add supervisor MCP server design spec

Captures the full architecture: Go MCP server on flamingo, Claude
supervisor+worker instance, LiteLLM/Ollama gradual delegation, TDD
as first skill, language-agnostic test runner detection, structured
JSON output contract, and three-layer model selection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-16 22:57:10 +02:00

7.4 KiB

Raw Blame History

Supervisor MCP Server — Design Spec

Date: 2026-04-16
Status: approved
Author: Mathias + Claude

Overview

The supervisor is a Go MCP server running locally on flamingo. It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. The first skill is TDD (red/green/refactor). The supervisor spawns a Claude instance to act as supervisor + worker, which runs verification commands directly on the target project's filesystem and optionally delegates code generation to Ollama models via LiteLLM.

Architecture

Claude Code (orchestrator session)
        │
        │  MCP tool call  e.g. tdd_red(project_root, spec)
        ▼
┌─────────────────────────────┐
│  supervisor (Go, flamingo)  │  thin — MCP protocol + process spawner only
└─────────────┬───────────────┘
              │  exec: claude --print "<prompt>"
              │         cwd: config/supervisor/   ← loads supervisor CLAUDE.md
              ▼
┌─────────────────────────────────────────┐
│  Claude supervisor instance             │
│  reads: config/supervisor/CLAUDE.md    │  orchestration rules, output contract
│  reads: config/supervisor/tdd.md       │  injected per skill invocation
│  tools: Read, Write, Bash              │  filesystem access + test runner
│  optional: LiteLLM → Ollama worker    │  code generation delegation (gradual)
└─────────────────────────────────────────┘
              │
              │  structured JSON response
              ▼
        MCP response → Claude Code

Key properties:

Go server knows nothing about TDD — it routes, spawns, and validates JSON
All orchestration logic is declarative in config/supervisor/CLAUDE.md
Ollama migration path requires no architectural changes — supervisor Claude delegates generation to LiteLLM and retains verification
Stateless — project path passed per call, no session registration

Project Structure

supervisor/
├── cmd/
│   └── supervisor/
│       └── main.go              ← starts MCP server, wires dependencies
├── internal/
│   ├── config/                  ← env vars + models.yaml → typed Config struct
│   ├── mcp/                     ← HTTP/SSE MCP server, tool dispatch
│   ├── registry/                ← maps tool names to skill handlers
│   ├── skills/
│   │   └── tdd/                 ← tdd_red, tdd_green, tdd_refactor handlers
│   └── exec/                    ← spawns claude --print, validates JSON output
├── config/
│   └── supervisor/
│       ├── CLAUDE.md            ← permanent orchestration rules
│       └── tdd.md               ← TDD discipline, injected per TDD call
├── docs/
│   └── superpowers/specs/       ← design specs (this file)
├── .env.example
└── Taskfile.yml

MCP Tools

TDD Skill

All tools are stateless. Project path is passed on every call.

Tool	Required inputs	Optional inputs	Returns
`tdd_red`	`project_root`, `spec`	`model`, `test_cmd`	JSON result
`tdd_green`	`project_root`, `test_path`	`model`, `test_cmd`	JSON result
`tdd_refactor`	`project_root`, `test_path`, `impl_path`	`model`, `test_cmd`	JSON result

Response schema (all tools)

{
  "status":        "pass | fail | error",
  "phase":         "red | green | refactor",
  "skill":         "tdd",
  "file_path":     "<absolute path to generated/modified file>",
  "runner_output": "<raw stdout+stderr from test runner>",
  "verified":      true,
  "model_used":    "<model that generated the code, or 'self'>",
  "message":       "<one sentence summary>"
}

verified reflects the subprocess exit code only — never the model's self-assessment.

Model Selection

Three-layer priority (highest to lowest):

model parameter in the MCP tool call — caller override
Per-skill default in config/models.yaml
Global fallback default

The Go skill handler reads the resolved model name and passes it to the Claude instance via the prompt (not an env var), so the supervisor Claude knows which model to delegate to per invocation.

# config/models.yaml
default: ollama/qwen3-coder-30b-tuned

skills:
  tdd:    ollama/qwen3-coder-30b-tuned
  review: ollama/devstral-tuned
  debug:  ollama/deepseek-r1-tuned

LiteLLM on iguana is the gateway for all Ollama and cloud model calls. The Go server passes LITELLM_BASE_URL and the resolved model name to the spawned Claude instance via the prompt context.

Supervisor CLAUDE.md

Loaded for every invocation. Defines identity, output contract, verification rules, loop prevention, and test runner detection.

Output contract: Every response is raw JSON matching the schema above. No preamble, no markdown, no prose. Malformed output is treated as a failed invocation and triggers a retry.

Verification: Exit code is the only source of truth. verified: true only when the subprocess exit code matches the phase expectation.

Loop prevention:

Maximum 3 attempts per phase
Two consecutive identical outputs → immediate status: error
No retries after 3 failures

Test runner detection (inspects project_root):

Signal	Command
`go.mod`	`go test ./...`
`package.json`	`npm test`
`pyproject.toml` / `pytest.ini`	`pytest`
`Cargo.toml`	`cargo test`
`Gemfile`	`bundle exec rspec`
`mix.exs`	`mix test`

test_cmd parameter bypasses detection entirely.

Skill Discipline: TDD (`config/supervisor/tdd.md`)

Read from disk by the Go TDD handler and appended to the prompt string on every TDD tool call. Not part of the always-on CLAUDE.md.

Red: Write one test, one behavior. Run suite. Confirm it fails. If it passes, the test is wrong — return fail.
Green: Write minimal code to pass. Nothing beyond what the test requires. Run suite. Confirm it passes. Fix code, not test.
Refactor: Structure only, no new behavior. Tests must stay green after every change.

Gradual Ollama Migration

Start: Claude supervisor + worker (does everything itself).
Next: Claude delegates code generation to qwen3-coder-30b-tuned via LiteLLM, retains verification and judgment.
Later: More skills, more workers, model routing by task type.

No architectural changes required between stages — only config/models.yaml and supervisor/CLAUDE.md prompting strategy evolve.

Infrastructure

Runs on: flamingo (daily driver, local filesystem access)
LiteLLM gateway: iguana:4000 (Tailscale)
MCP transport: HTTP/SSE
Port: 3200 (default, configurable)
Registered in: .context/mcp.json → propagated to all harnesses via task context:sync

Deferred

Generic phase engine (Option C): revisit when 4+ skills exist and shared phase shape is visible. Saved in project memory.
Session registration: stateless for now, revisit if per-session context becomes valuable.
Authentication: internal Tailscale network only for now.

7.4 KiB Raw Blame History