Captures the full architecture: Go MCP server on flamingo, Claude supervisor+worker instance, LiteLLM/Ollama gradual delegation, TDD as first skill, language-agnostic test runner detection, structured JSON output contract, and three-layer model selection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.4 KiB
Supervisor MCP Server — Design Spec
Date: 2026-04-16
Status: approved
Author: Mathias + Claude
Overview
The supervisor is a Go MCP server running locally on flamingo. It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. The first skill is TDD (red/green/refactor). The supervisor spawns a Claude instance to act as supervisor + worker, which runs verification commands directly on the target project's filesystem and optionally delegates code generation to Ollama models via LiteLLM.
Architecture
Claude Code (orchestrator session)
│
│ MCP tool call e.g. tdd_red(project_root, spec)
▼
┌─────────────────────────────┐
│ supervisor (Go, flamingo) │ thin — MCP protocol + process spawner only
└─────────────┬───────────────┘
│ exec: claude --print "<prompt>"
│ cwd: config/supervisor/ ← loads supervisor CLAUDE.md
▼
┌─────────────────────────────────────────┐
│ Claude supervisor instance │
│ reads: config/supervisor/CLAUDE.md │ orchestration rules, output contract
│ reads: config/supervisor/tdd.md │ injected per skill invocation
│ tools: Read, Write, Bash │ filesystem access + test runner
│ optional: LiteLLM → Ollama worker │ code generation delegation (gradual)
└─────────────────────────────────────────┘
│
│ structured JSON response
▼
MCP response → Claude Code
Key properties:
- Go server knows nothing about TDD — it routes, spawns, and validates JSON
- All orchestration logic is declarative in
config/supervisor/CLAUDE.md - Ollama migration path requires no architectural changes — supervisor Claude delegates generation to LiteLLM and retains verification
- Stateless — project path passed per call, no session registration
Project Structure
supervisor/
├── cmd/
│ └── supervisor/
│ └── main.go ← starts MCP server, wires dependencies
├── internal/
│ ├── config/ ← env vars + models.yaml → typed Config struct
│ ├── mcp/ ← HTTP/SSE MCP server, tool dispatch
│ ├── registry/ ← maps tool names to skill handlers
│ ├── skills/
│ │ └── tdd/ ← tdd_red, tdd_green, tdd_refactor handlers
│ └── exec/ ← spawns claude --print, validates JSON output
├── config/
│ └── supervisor/
│ ├── CLAUDE.md ← permanent orchestration rules
│ └── tdd.md ← TDD discipline, injected per TDD call
├── docs/
│ └── superpowers/specs/ ← design specs (this file)
├── .env.example
└── Taskfile.yml
MCP Tools
TDD Skill
All tools are stateless. Project path is passed on every call.
| Tool | Required inputs | Optional inputs | Returns |
|---|---|---|---|
tdd_red |
project_root, spec |
model, test_cmd |
JSON result |
tdd_green |
project_root, test_path |
model, test_cmd |
JSON result |
tdd_refactor |
project_root, test_path, impl_path |
model, test_cmd |
JSON result |
Response schema (all tools)
{
"status": "pass | fail | error",
"phase": "red | green | refactor",
"skill": "tdd",
"file_path": "<absolute path to generated/modified file>",
"runner_output": "<raw stdout+stderr from test runner>",
"verified": true,
"model_used": "<model that generated the code, or 'self'>",
"message": "<one sentence summary>"
}
verified reflects the subprocess exit code only — never the model's self-assessment.
Model Selection
Three-layer priority (highest to lowest):
modelparameter in the MCP tool call — caller override- Per-skill default in
config/models.yaml - Global fallback default
The Go skill handler reads the resolved model name and passes it to the Claude instance via the prompt (not an env var), so the supervisor Claude knows which model to delegate to per invocation.
# config/models.yaml
default: ollama/qwen3-coder-30b-tuned
skills:
tdd: ollama/qwen3-coder-30b-tuned
review: ollama/devstral-tuned
debug: ollama/deepseek-r1-tuned
LiteLLM on iguana is the gateway for all Ollama and cloud model calls. The Go server passes LITELLM_BASE_URL and the resolved model name to the spawned Claude instance via the prompt context.
Supervisor CLAUDE.md
Loaded for every invocation. Defines identity, output contract, verification rules, loop prevention, and test runner detection.
Output contract: Every response is raw JSON matching the schema above. No preamble, no markdown, no prose. Malformed output is treated as a failed invocation and triggers a retry.
Verification: Exit code is the only source of truth. verified: true only when the subprocess exit code matches the phase expectation.
Loop prevention:
- Maximum 3 attempts per phase
- Two consecutive identical outputs → immediate
status: error - No retries after 3 failures
Test runner detection (inspects project_root):
| Signal | Command |
|---|---|
go.mod |
go test ./... |
package.json |
npm test |
pyproject.toml / pytest.ini |
pytest |
Cargo.toml |
cargo test |
Gemfile |
bundle exec rspec |
mix.exs |
mix test |
test_cmd parameter bypasses detection entirely.
Skill Discipline: TDD (config/supervisor/tdd.md)
Read from disk by the Go TDD handler and appended to the prompt string on every TDD tool call. Not part of the always-on CLAUDE.md.
- Red: Write one test, one behavior. Run suite. Confirm it fails. If it passes, the test is wrong — return
fail. - Green: Write minimal code to pass. Nothing beyond what the test requires. Run suite. Confirm it passes. Fix code, not test.
- Refactor: Structure only, no new behavior. Tests must stay green after every change.
Gradual Ollama Migration
Start: Claude supervisor + worker (does everything itself).
Next: Claude delegates code generation to qwen3-coder-30b-tuned via LiteLLM, retains verification and judgment.
Later: More skills, more workers, model routing by task type.
No architectural changes required between stages — only config/models.yaml and supervisor/CLAUDE.md prompting strategy evolve.
Infrastructure
- Runs on: flamingo (daily driver, local filesystem access)
- LiteLLM gateway: iguana:4000 (Tailscale)
- MCP transport: HTTP/SSE
- Port: 3200 (default, configurable)
- Registered in:
.context/mcp.json→ propagated to all harnesses viatask context:sync
Deferred
- Generic phase engine (Option C): revisit when 4+ skills exist and shared phase shape is visible. Saved in project memory.
- Session registration: stateless for now, revisit if per-session context becomes valuable.
- Authentication: internal Tailscale network only for now.