12 TDD tasks covering: Go module, Result type, Config, Models, Executor (claude --print), Registry, MCP server, TDD skill, main wiring, config files, Taskfile/MCP registration, smoke test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1814 lines
47 KiB
Markdown
1814 lines
47 KiB
Markdown
# Supervisor MCP Server Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Build a Go MCP server that exposes TDD skill workers as tools Claude Code can call natively, with a Claude supervisor instance handling orchestration and verification.
|
|
|
|
**Architecture:** A thin Go HTTP server routes MCP tool calls to skill handlers. Each handler spawns `claude --print` with `--bare --system-prompt --json-schema --permission-mode bypassPermissions`, injecting supervisor rules and skill discipline. The spawned Claude instance uses its own Bash/Read/Write tools to run tests and generate code, returning a structured JSON result. LiteLLM on iguana handles Ollama model delegation.
|
|
|
|
**Tech Stack:** Go 1.26, net/http (stdlib), gopkg.in/yaml.v3, testify, claude CLI, LiteLLM
|
|
|
|
---
|
|
|
|
## File Map
|
|
|
|
| File | Responsibility |
|
|
|------|---------------|
|
|
| `go.mod` | Module definition |
|
|
| `cmd/supervisor/main.go` | Entry point, wires all dependencies |
|
|
| `internal/config/config.go` | Env vars → typed Config struct |
|
|
| `internal/config/config_test.go` | Config parsing tests |
|
|
| `internal/config/models.go` | YAML model routing table |
|
|
| `internal/config/models_test.go` | Model resolution tests |
|
|
| `internal/exec/result.go` | Result struct + JSON schema constant |
|
|
| `internal/exec/result_test.go` | JSON parsing/validation tests |
|
|
| `internal/exec/executor.go` | Spawn claude, capture output, parse result |
|
|
| `internal/exec/executor_test.go` | Executor tests using fake binary |
|
|
| `internal/registry/registry.go` | Skill interface + registry |
|
|
| `internal/registry/registry_test.go` | Registry routing tests |
|
|
| `internal/mcp/server.go` | HTTP/JSON-RPC MCP server |
|
|
| `internal/mcp/server_test.go` | MCP protocol tests |
|
|
| `internal/skills/tdd/skill.go` | Tool definitions (MCP schema) |
|
|
| `internal/skills/tdd/handlers.go` | tdd_red, tdd_green, tdd_refactor |
|
|
| `internal/skills/tdd/handlers_test.go` | Handler prompt-building tests |
|
|
| `config/supervisor/CLAUDE.md` | Supervisor orchestration rules |
|
|
| `config/supervisor/tdd.md` | TDD iron law (injected per call) |
|
|
| `config/models.yaml` | Per-skill model routing table |
|
|
| `.env.example` | Documented env vars |
|
|
| `Taskfile.yml` | Add `supervisor:dev` and `supervisor:build` tasks |
|
|
| `.context/mcp.json` | Register supervisor as MCP server |
|
|
|
|
---
|
|
|
|
## Task 1: Go module + project skeleton
|
|
|
|
**Files:**
|
|
- Create: `go.mod`
|
|
- Create: `cmd/supervisor/main.go`
|
|
|
|
- [ ] **Step 1: Write a failing test to verify the binary compiles and exits cleanly**
|
|
|
|
Create `cmd/supervisor/main_test.go`:
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"os/exec"
|
|
"testing"
|
|
)
|
|
|
|
func TestBinaryCompiles(t *testing.T) {
|
|
cmd := exec.Command("go", "build", "./...")
|
|
out, err := cmd.CombinedOutput()
|
|
if err != nil {
|
|
t.Fatalf("build failed: %s\n%s", err, out)
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run test — expect it to fail (no go.mod yet)**
|
|
|
|
```bash
|
|
go test ./...
|
|
```
|
|
Expected: error — no go.mod
|
|
|
|
- [ ] **Step 3: Initialize the module**
|
|
|
|
```bash
|
|
go mod init github.com/mathiasbq/supervisor
|
|
```
|
|
|
|
- [ ] **Step 4: Create `cmd/supervisor/main.go`**
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
)
|
|
|
|
func main() {
|
|
fmt.Fprintln(os.Stderr, "supervisor: not yet implemented")
|
|
os.Exit(1)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 5: Run test — expect it to pass**
|
|
|
|
```bash
|
|
go test ./cmd/supervisor/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add go.mod cmd/
|
|
git commit -m "chore: initialize go module and cmd skeleton"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Result type
|
|
|
|
**Files:**
|
|
- Create: `internal/exec/result.go`
|
|
- Create: `internal/exec/result_test.go`
|
|
|
|
- [ ] **Step 1: Write failing tests for result parsing**
|
|
|
|
Create `internal/exec/result_test.go`:
|
|
```go
|
|
package exec_test
|
|
|
|
import (
|
|
"encoding/json"
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/exec"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func TestResultParsesValidJSON(t *testing.T) {
|
|
raw := `{
|
|
"status": "pass",
|
|
"phase": "red",
|
|
"skill": "tdd",
|
|
"file_path": "/tmp/foo_test.go",
|
|
"runner_output": "--- FAIL: TestFoo",
|
|
"verified": true,
|
|
"model_used": "self",
|
|
"message": "test fails as expected"
|
|
}`
|
|
var r exec.Result
|
|
require.NoError(t, json.Unmarshal([]byte(raw), &r))
|
|
assert.Equal(t, "pass", r.Status)
|
|
assert.Equal(t, "red", r.Phase)
|
|
assert.True(t, r.Verified)
|
|
}
|
|
|
|
func TestResultValidation(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
result exec.Result
|
|
wantErr bool
|
|
}{
|
|
{
|
|
name: "valid pass result",
|
|
result: exec.Result{
|
|
Status: "pass", Phase: "red", Skill: "tdd",
|
|
FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL",
|
|
Verified: true, ModelUsed: "self", Message: "ok",
|
|
},
|
|
wantErr: false,
|
|
},
|
|
{
|
|
name: "empty status",
|
|
result: exec.Result{Phase: "red", Skill: "tdd"},
|
|
wantErr: true,
|
|
},
|
|
{
|
|
name: "invalid status",
|
|
result: exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"},
|
|
wantErr: true,
|
|
},
|
|
{
|
|
name: "invalid phase",
|
|
result: exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"},
|
|
wantErr: true,
|
|
},
|
|
}
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
err := tt.result.Validate()
|
|
if tt.wantErr {
|
|
assert.Error(t, err)
|
|
} else {
|
|
assert.NoError(t, err)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL (package does not exist)**
|
|
|
|
```bash
|
|
go test ./internal/exec/...
|
|
```
|
|
Expected: FAIL — cannot find package
|
|
|
|
- [ ] **Step 3: Install testify**
|
|
|
|
```bash
|
|
go get github.com/stretchr/testify@latest
|
|
```
|
|
|
|
- [ ] **Step 4: Create `internal/exec/result.go`**
|
|
|
|
```go
|
|
package exec
|
|
|
|
import (
|
|
"errors"
|
|
"strings"
|
|
)
|
|
|
|
// Result is the structured JSON output from every supervisor invocation.
|
|
// The JSON schema constant is passed to claude via --json-schema so Claude
|
|
// validates its own output before returning.
|
|
type Result struct {
|
|
Status string `json:"status"` // pass | fail | error
|
|
Phase string `json:"phase"` // red | green | refactor
|
|
Skill string `json:"skill"` // tdd | review | ...
|
|
FilePath string `json:"file_path"` // absolute path to generated file
|
|
RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
|
|
Verified bool `json:"verified"` // based on exit code, never self-report
|
|
ModelUsed string `json:"model_used"` // model name or "self"
|
|
Message string `json:"message"` // one sentence summary
|
|
}
|
|
|
|
var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true}
|
|
var validPhases = map[string]bool{"red": true, "green": true, "refactor": true}
|
|
|
|
func (r Result) Validate() error {
|
|
var errs []string
|
|
if !validStatuses[r.Status] {
|
|
errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
|
|
}
|
|
if !validPhases[r.Phase] {
|
|
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
|
|
}
|
|
if r.Skill == "" {
|
|
errs = append(errs, "skill is required")
|
|
}
|
|
if len(errs) > 0 {
|
|
return errors.New(strings.Join(errs, "; "))
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// Schema is passed to claude --json-schema to enforce structured output.
|
|
const Schema = `{
|
|
"type": "object",
|
|
"required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
|
|
"properties": {
|
|
"status": {"type": "string", "enum": ["pass","fail","error"]},
|
|
"phase": {"type": "string", "enum": ["red","green","refactor"]},
|
|
"skill": {"type": "string"},
|
|
"file_path": {"type": "string"},
|
|
"runner_output": {"type": "string"},
|
|
"verified": {"type": "boolean"},
|
|
"model_used": {"type": "string"},
|
|
"message": {"type": "string"}
|
|
}
|
|
}`
|
|
```
|
|
|
|
- [ ] **Step 5: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/exec/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add internal/exec/result.go internal/exec/result_test.go
|
|
git commit -m "feat: add Result type with JSON schema and validation"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: Config package
|
|
|
|
**Files:**
|
|
- Create: `internal/config/config.go`
|
|
- Create: `internal/config/config_test.go`
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Create `internal/config/config_test.go`:
|
|
```go
|
|
package config_test
|
|
|
|
import (
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/config"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func TestLoadDefaults(t *testing.T) {
|
|
t.Setenv("SUPERVISOR_PORT", "")
|
|
t.Setenv("LITELLM_BASE_URL", "")
|
|
t.Setenv("LITELLM_API_KEY", "")
|
|
t.Setenv("SUPERVISOR_CONFIG_DIR", "")
|
|
|
|
cfg, err := config.Load()
|
|
require.NoError(t, err)
|
|
assert.Equal(t, "3200", cfg.Port)
|
|
assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL)
|
|
assert.Equal(t, "./config/supervisor", cfg.ConfigDir)
|
|
}
|
|
|
|
func TestLoadFromEnv(t *testing.T) {
|
|
t.Setenv("SUPERVISOR_PORT", "4000")
|
|
t.Setenv("LITELLM_BASE_URL", "http://localhost:4000")
|
|
t.Setenv("LITELLM_API_KEY", "test-key")
|
|
t.Setenv("SUPERVISOR_CONFIG_DIR", "/etc/supervisor")
|
|
|
|
cfg, err := config.Load()
|
|
require.NoError(t, err)
|
|
assert.Equal(t, "4000", cfg.Port)
|
|
assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL)
|
|
assert.Equal(t, "test-key", cfg.LiteLLMAPIKey)
|
|
assert.Equal(t, "/etc/supervisor", cfg.ConfigDir)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/config/...
|
|
```
|
|
Expected: FAIL — cannot find package
|
|
|
|
- [ ] **Step 3: Create `internal/config/config.go`**
|
|
|
|
```go
|
|
package config
|
|
|
|
import "os"
|
|
|
|
type Config struct {
|
|
Port string // SUPERVISOR_PORT, default 3200
|
|
LiteLLMBaseURL string // LITELLM_BASE_URL, default http://iguana:4000
|
|
LiteLLMAPIKey string // LITELLM_API_KEY
|
|
ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
|
|
ModelsFile string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
|
|
}
|
|
|
|
func Load() (Config, error) {
|
|
cfg := Config{
|
|
Port: envOr("SUPERVISOR_PORT", "3200"),
|
|
LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"),
|
|
LiteLLMAPIKey: os.Getenv("LITELLM_API_KEY"),
|
|
ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"),
|
|
}
|
|
cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
|
|
return cfg, nil
|
|
}
|
|
|
|
func envOr(key, def string) string {
|
|
if v := os.Getenv(key); v != "" {
|
|
return v
|
|
}
|
|
return def
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/config/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add internal/config/
|
|
git commit -m "feat: add config package with env-var loading"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Models config
|
|
|
|
**Files:**
|
|
- Create: `internal/config/models.go`
|
|
- Create: `internal/config/models_test.go`
|
|
- Create: `config/models.yaml`
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Append to `internal/config/models_test.go` (new file):
|
|
```go
|
|
package config_test
|
|
|
|
import (
|
|
"os"
|
|
"path/filepath"
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/config"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func TestModelsResolve(t *testing.T) {
|
|
yaml := `
|
|
default: ollama/default-model
|
|
skills:
|
|
tdd: ollama/qwen3-coder-30b-tuned
|
|
review: ollama/devstral-tuned
|
|
`
|
|
f := filepath.Join(t.TempDir(), "models.yaml")
|
|
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
|
|
|
|
m, err := config.LoadModels(f)
|
|
require.NoError(t, err)
|
|
|
|
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
|
|
assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
|
|
assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
|
|
}
|
|
|
|
func TestModelsOverride(t *testing.T) {
|
|
yaml := `
|
|
default: ollama/default-model
|
|
skills:
|
|
tdd: ollama/qwen3-coder-30b-tuned
|
|
`
|
|
f := filepath.Join(t.TempDir(), "models.yaml")
|
|
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
|
|
|
|
m, err := config.LoadModels(f)
|
|
require.NoError(t, err)
|
|
|
|
// caller override wins over skill default
|
|
assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6"))
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/config/...
|
|
```
|
|
Expected: FAIL — Models type undefined
|
|
|
|
- [ ] **Step 3: Install yaml dependency**
|
|
|
|
```bash
|
|
go get gopkg.in/yaml.v3
|
|
```
|
|
|
|
- [ ] **Step 4: Create `internal/config/models.go`**
|
|
|
|
```go
|
|
package config
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
|
|
"gopkg.in/yaml.v3"
|
|
)
|
|
|
|
type modelsFile struct {
|
|
Default string `yaml:"default"`
|
|
Skills map[string]string `yaml:"skills"`
|
|
}
|
|
|
|
type Models struct {
|
|
data modelsFile
|
|
}
|
|
|
|
func LoadModels(path string) (Models, error) {
|
|
raw, err := os.ReadFile(path)
|
|
if err != nil {
|
|
return Models{}, fmt.Errorf("load models: %w", err)
|
|
}
|
|
var f modelsFile
|
|
if err := yaml.Unmarshal(raw, &f); err != nil {
|
|
return Models{}, fmt.Errorf("parse models: %w", err)
|
|
}
|
|
return Models{data: f}, nil
|
|
}
|
|
|
|
// Resolve returns the model for a skill, respecting three-layer priority:
|
|
// 1. override (from MCP call) — highest
|
|
// 2. per-skill default from models.yaml
|
|
// 3. global default
|
|
func (m Models) Resolve(skill, override string) string {
|
|
if override != "" {
|
|
return override
|
|
}
|
|
if model, ok := m.data.Skills[skill]; ok {
|
|
return model
|
|
}
|
|
return m.data.Default
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 5: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/config/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 6: Create `config/models.yaml`**
|
|
|
|
```yaml
|
|
# Model routing table — three-layer priority:
|
|
# 1. model param in MCP tool call (caller override)
|
|
# 2. per-skill entry here
|
|
# 3. default (fallback)
|
|
default: ollama/qwen3-coder-30b-tuned
|
|
|
|
skills:
|
|
tdd: ollama/qwen3-coder-30b-tuned
|
|
review: ollama/devstral-tuned
|
|
debug: ollama/deepseek-r1-tuned
|
|
```
|
|
|
|
- [ ] **Step 7: Commit**
|
|
|
|
```bash
|
|
git add internal/config/models.go internal/config/models_test.go config/models.yaml
|
|
git commit -m "feat: add model routing table with three-layer priority"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: Exec package
|
|
|
|
**Files:**
|
|
- Create: `internal/exec/executor.go`
|
|
- Create: `internal/exec/executor_test.go`
|
|
- Modify: `internal/exec/result.go` (add Schema constant — already done in Task 2)
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Create `internal/exec/executor_test.go`:
|
|
```go
|
|
package exec_test
|
|
|
|
import (
|
|
"context"
|
|
"os"
|
|
"path/filepath"
|
|
"testing"
|
|
"time"
|
|
|
|
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
// fakeClaudePath writes a shell script that prints a fixed JSON result
|
|
// and returns its path. Used to test executor without invoking real claude.
|
|
func fakeClaudePath(t *testing.T, output string, exitCode int) string {
|
|
t.Helper()
|
|
dir := t.TempDir()
|
|
script := filepath.Join(dir, "claude")
|
|
var content string
|
|
if exitCode != 0 {
|
|
content = "#!/bin/sh\necho 'error' >&2\nexit 1\n"
|
|
} else {
|
|
content = "#!/bin/sh\necho '" + output + "'\n"
|
|
}
|
|
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
|
|
return script
|
|
}
|
|
|
|
func TestExecutorParsesValidResult(t *testing.T) {
|
|
validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}`
|
|
claude := fakeClaudePath(t, validJSON, 0)
|
|
|
|
ex := iexec.New(iexec.Config{
|
|
ClaudeBinary: claude,
|
|
SystemPrompt: "you are a supervisor",
|
|
Timeout: 5 * time.Second,
|
|
})
|
|
|
|
result, err := ex.Run(context.Background(), iexec.Request{
|
|
SkillPrompt: "tdd rules",
|
|
TaskPrompt: "run red phase",
|
|
})
|
|
require.NoError(t, err)
|
|
assert.Equal(t, "pass", result.Status)
|
|
assert.True(t, result.Verified)
|
|
}
|
|
|
|
func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) {
|
|
claude := fakeClaudePath(t, "", 1)
|
|
|
|
ex := iexec.New(iexec.Config{
|
|
ClaudeBinary: claude,
|
|
SystemPrompt: "you are a supervisor",
|
|
Timeout: 5 * time.Second,
|
|
})
|
|
|
|
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"})
|
|
assert.Error(t, err)
|
|
}
|
|
|
|
func TestExecutorTimesOut(t *testing.T) {
|
|
dir := t.TempDir()
|
|
script := filepath.Join(dir, "claude")
|
|
require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755))
|
|
|
|
ex := iexec.New(iexec.Config{
|
|
ClaudeBinary: script,
|
|
SystemPrompt: "you are a supervisor",
|
|
Timeout: 100 * time.Millisecond,
|
|
})
|
|
|
|
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
|
|
assert.ErrorContains(t, err, "timeout")
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Verify the test file compiles**
|
|
|
|
```bash
|
|
go vet ./internal/exec/...
|
|
```
|
|
Expected: no errors
|
|
|
|
- [ ] **Step 3: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/exec/...
|
|
```
|
|
Expected: FAIL — Executor type undefined
|
|
|
|
- [ ] **Step 4: Create `internal/exec/executor.go`**
|
|
|
|
```go
|
|
package exec
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"os/exec"
|
|
"strings"
|
|
"time"
|
|
)
|
|
|
|
// Config holds executor configuration.
|
|
type Config struct {
|
|
ClaudeBinary string // path to claude binary, defaults to "claude"
|
|
SystemPrompt string // contents of supervisor CLAUDE.md
|
|
Timeout time.Duration // per-invocation timeout, default 120s
|
|
LiteLLMBaseURL string // passed to Claude so it can delegate to Ollama
|
|
LiteLLMAPIKey string // passed to Claude for LiteLLM auth
|
|
}
|
|
|
|
// Request is the input to a single supervisor invocation.
|
|
type Request struct {
|
|
SkillPrompt string // skill-specific discipline (e.g. tdd.md contents)
|
|
TaskPrompt string // the specific task (phase, project_root, spec, model)
|
|
Model string // resolved model name, passed in task prompt
|
|
Tools string // comma-separated allowed tools, default "Bash,Read,Write"
|
|
}
|
|
|
|
// Executor spawns a claude instance and captures its structured JSON output.
|
|
type Executor struct {
|
|
cfg Config
|
|
}
|
|
|
|
func New(cfg Config) *Executor {
|
|
if cfg.ClaudeBinary == "" {
|
|
cfg.ClaudeBinary = "claude"
|
|
}
|
|
if cfg.Timeout == 0 {
|
|
cfg.Timeout = 120 * time.Second
|
|
}
|
|
return &Executor{cfg: cfg}
|
|
}
|
|
|
|
func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
|
|
ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout)
|
|
defer cancel()
|
|
|
|
tools := req.Tools
|
|
if tools == "" {
|
|
tools = "Bash,Read,Write"
|
|
}
|
|
|
|
// Build the full prompt: system rules + skill rules + task + infra context
|
|
litellmCtx := fmt.Sprintf(
|
|
"LITELLM_BASE_URL: %s\nLITELLM_API_KEY: %s",
|
|
e.cfg.LiteLLMBaseURL, e.cfg.LiteLLMAPIKey,
|
|
)
|
|
prompt := strings.Join([]string{
|
|
e.cfg.SystemPrompt,
|
|
"---",
|
|
req.SkillPrompt,
|
|
"---",
|
|
litellmCtx,
|
|
"---",
|
|
req.TaskPrompt,
|
|
}, "\n\n")
|
|
|
|
args := []string{
|
|
"--print",
|
|
"--bare",
|
|
"--permission-mode", "bypassPermissions",
|
|
"--tools", tools,
|
|
"--json-schema", Schema,
|
|
"--output-format", "text",
|
|
prompt,
|
|
}
|
|
|
|
cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
|
|
var stdout, stderr bytes.Buffer
|
|
cmd.Stdout = &stdout
|
|
cmd.Stderr = &stderr
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
if ctx.Err() != nil {
|
|
return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout)
|
|
}
|
|
return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
|
|
}
|
|
|
|
var r Result
|
|
if err := json.Unmarshal(stdout.Bytes(), &r); err != nil {
|
|
return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String())
|
|
}
|
|
if err := r.Validate(); err != nil {
|
|
return Result{}, fmt.Errorf("invalid result: %w", err)
|
|
}
|
|
return r, nil
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 5: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/exec/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add internal/exec/executor.go internal/exec/executor_test.go
|
|
git commit -m "feat: add executor that spawns claude and parses JSON result"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Registry + Skill interface
|
|
|
|
**Files:**
|
|
- Create: `internal/registry/registry.go`
|
|
- Create: `internal/registry/registry_test.go`
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Create `internal/registry/registry_test.go`:
|
|
```go
|
|
package registry_test
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/registry"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
type stubSkill struct {
|
|
name string
|
|
}
|
|
|
|
func (s stubSkill) Name() string { return s.name }
|
|
func (s stubSkill) Tools() []registry.ToolDef {
|
|
return []registry.ToolDef{{
|
|
Name: s.name + "_tool",
|
|
Description: "stub tool",
|
|
InputSchema: json.RawMessage(`{"type":"object","properties":{}}`),
|
|
}}
|
|
}
|
|
func (s stubSkill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
|
|
return json.RawMessage(`{"ok":true}`), nil
|
|
}
|
|
|
|
func TestRegistryRoutes(t *testing.T) {
|
|
r := registry.New()
|
|
r.Register(stubSkill{name: "tdd"})
|
|
|
|
result, err := r.Dispatch(context.Background(), "tdd_tool", json.RawMessage(`{}`))
|
|
require.NoError(t, err)
|
|
assert.JSONEq(t, `{"ok":true}`, string(result))
|
|
}
|
|
|
|
func TestRegistryUnknownTool(t *testing.T) {
|
|
r := registry.New()
|
|
_, err := r.Dispatch(context.Background(), "unknown_tool", json.RawMessage(`{}`))
|
|
assert.ErrorContains(t, err, "unknown tool")
|
|
}
|
|
|
|
func TestRegistryListsTools(t *testing.T) {
|
|
r := registry.New()
|
|
r.Register(stubSkill{name: "tdd"})
|
|
|
|
tools := r.Tools()
|
|
require.Len(t, tools, 1)
|
|
assert.Equal(t, "tdd_tool", tools[0].Name)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/registry/...
|
|
```
|
|
Expected: FAIL — registry package not found
|
|
|
|
- [ ] **Step 3: Create `internal/registry/registry.go`**
|
|
|
|
```go
|
|
package registry
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
)
|
|
|
|
// ToolDef describes a single MCP tool exposed by a skill.
|
|
type ToolDef struct {
|
|
Name string `json:"name"`
|
|
Description string `json:"description"`
|
|
InputSchema json.RawMessage `json:"inputSchema"`
|
|
}
|
|
|
|
// Skill is implemented by each skill package (tdd, review, etc.).
|
|
type Skill interface {
|
|
Name() string
|
|
Tools() []ToolDef
|
|
Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error)
|
|
}
|
|
|
|
// Registry routes MCP tool calls to the correct skill handler.
|
|
type Registry struct {
|
|
skills map[string]Skill // tool name → skill
|
|
tools []ToolDef
|
|
}
|
|
|
|
func New() *Registry {
|
|
return &Registry{skills: make(map[string]Skill)}
|
|
}
|
|
|
|
func (r *Registry) Register(s Skill) {
|
|
for _, t := range s.Tools() {
|
|
r.skills[t.Name] = s
|
|
r.tools = append(r.tools, t)
|
|
}
|
|
}
|
|
|
|
func (r *Registry) Tools() []ToolDef {
|
|
return r.tools
|
|
}
|
|
|
|
func (r *Registry) Dispatch(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
|
|
s, ok := r.skills[tool]
|
|
if !ok {
|
|
return nil, fmt.Errorf("unknown tool: %s", tool)
|
|
}
|
|
return s.Handle(ctx, tool, args)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/registry/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add internal/registry/
|
|
git commit -m "feat: add skill registry with tool routing"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: MCP server
|
|
|
|
**Files:**
|
|
- Create: `internal/mcp/server.go`
|
|
- Create: `internal/mcp/server_test.go`
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Create `internal/mcp/server_test.go`:
|
|
```go
|
|
package mcp_test
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"encoding/json"
|
|
"net/http"
|
|
"net/http/httptest"
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/mcp"
|
|
"github.com/mathiasbq/supervisor/internal/registry"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func jsonBody(t *testing.T, v any) *bytes.Buffer {
|
|
t.Helper()
|
|
b, err := json.Marshal(v)
|
|
require.NoError(t, err)
|
|
return bytes.NewBuffer(b)
|
|
}
|
|
|
|
func TestMCPInitialize(t *testing.T) {
|
|
reg := registry.New()
|
|
srv := mcp.NewServer(reg)
|
|
|
|
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
|
|
"jsonrpc": "2.0",
|
|
"id": 1,
|
|
"method": "initialize",
|
|
"params": map[string]any{},
|
|
}))
|
|
req.Header.Set("Content-Type", "application/json")
|
|
rr := httptest.NewRecorder()
|
|
|
|
srv.ServeHTTP(rr, req)
|
|
|
|
assert.Equal(t, http.StatusOK, rr.Code)
|
|
var resp map[string]any
|
|
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
|
|
result := resp["result"].(map[string]any)
|
|
assert.Equal(t, "2024-11-05", result["protocolVersion"])
|
|
}
|
|
|
|
func TestMCPToolsList(t *testing.T) {
|
|
reg := registry.New()
|
|
srv := mcp.NewServer(reg)
|
|
|
|
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
|
|
"jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": map[string]any{},
|
|
}))
|
|
req.Header.Set("Content-Type", "application/json")
|
|
rr := httptest.NewRecorder()
|
|
srv.ServeHTTP(rr, req)
|
|
|
|
assert.Equal(t, http.StatusOK, rr.Code)
|
|
var resp map[string]any
|
|
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
|
|
result := resp["result"].(map[string]any)
|
|
assert.NotNil(t, result["tools"])
|
|
}
|
|
|
|
func TestMCPUnknownMethod(t *testing.T) {
|
|
reg := registry.New()
|
|
srv := mcp.NewServer(reg)
|
|
|
|
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
|
|
"jsonrpc": "2.0", "id": 3, "method": "unknown/method", "params": map[string]any{},
|
|
}))
|
|
req.Header.Set("Content-Type", "application/json")
|
|
rr := httptest.NewRecorder()
|
|
srv.ServeHTTP(rr, req)
|
|
|
|
assert.Equal(t, http.StatusOK, rr.Code)
|
|
var resp map[string]any
|
|
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
|
|
assert.NotNil(t, resp["error"])
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/mcp/...
|
|
```
|
|
Expected: FAIL — mcp package not found
|
|
|
|
- [ ] **Step 3: Create `internal/mcp/server.go`**
|
|
|
|
```go
|
|
package mcp
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"net/http"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/registry"
|
|
)
|
|
|
|
type request struct {
|
|
JSONRPC string `json:"jsonrpc"`
|
|
ID any `json:"id"`
|
|
Method string `json:"method"`
|
|
Params json.RawMessage `json:"params"`
|
|
}
|
|
|
|
type response struct {
|
|
JSONRPC string `json:"jsonrpc"`
|
|
ID any `json:"id,omitempty"`
|
|
Result any `json:"result,omitempty"`
|
|
Error *rpcError `json:"error,omitempty"`
|
|
}
|
|
|
|
type rpcError struct {
|
|
Code int `json:"code"`
|
|
Message string `json:"message"`
|
|
}
|
|
|
|
// Server is an HTTP handler implementing the MCP JSON-RPC protocol.
|
|
type Server struct {
|
|
reg *registry.Registry
|
|
}
|
|
|
|
func NewServer(reg *registry.Registry) *Server {
|
|
return &Server{reg: reg}
|
|
}
|
|
|
|
func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
|
var req request
|
|
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
|
writeError(w, nil, -32700, "parse error")
|
|
return
|
|
}
|
|
|
|
var result any
|
|
var rpcErr *rpcError
|
|
|
|
switch req.Method {
|
|
case "initialize":
|
|
result = map[string]any{
|
|
"protocolVersion": "2024-11-05",
|
|
"capabilities": map[string]any{"tools": map[string]any{}},
|
|
"serverInfo": map[string]any{"name": "supervisor", "version": "0.1.0"},
|
|
}
|
|
|
|
case "tools/list":
|
|
result = map[string]any{"tools": s.reg.Tools()}
|
|
|
|
case "tools/call":
|
|
var p struct {
|
|
Name string `json:"name"`
|
|
Arguments json.RawMessage `json:"arguments"`
|
|
}
|
|
if err := json.Unmarshal(req.Params, &p); err != nil {
|
|
rpcErr = &rpcError{Code: -32602, Message: "invalid params"}
|
|
break
|
|
}
|
|
out, err := s.reg.Dispatch(context.Background(), p.Name, p.Arguments)
|
|
if err != nil {
|
|
rpcErr = &rpcError{Code: -32000, Message: err.Error()}
|
|
break
|
|
}
|
|
result = map[string]any{
|
|
"content": []map[string]any{{"type": "text", "text": string(out)}},
|
|
}
|
|
|
|
default:
|
|
rpcErr = &rpcError{Code: -32601, Message: "method not found: " + req.Method}
|
|
}
|
|
|
|
w.Header().Set("Content-Type", "application/json")
|
|
json.NewEncoder(w).Encode(response{
|
|
JSONRPC: "2.0",
|
|
ID: req.ID,
|
|
Result: result,
|
|
Error: rpcErr,
|
|
})
|
|
}
|
|
|
|
func writeError(w http.ResponseWriter, id any, code int, msg string) {
|
|
w.Header().Set("Content-Type", "application/json")
|
|
json.NewEncoder(w).Encode(response{
|
|
JSONRPC: "2.0",
|
|
ID: id,
|
|
Error: &rpcError{Code: code, Message: msg},
|
|
})
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/mcp/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add internal/mcp/
|
|
git commit -m "feat: add MCP HTTP server with JSON-RPC 2.0 transport"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 8: TDD skill
|
|
|
|
**Files:**
|
|
- Create: `internal/skills/tdd/skill.go`
|
|
- Create: `internal/skills/tdd/handlers.go`
|
|
- Create: `internal/skills/tdd/handlers_test.go`
|
|
|
|
- [ ] **Step 1: Write failing tests**
|
|
|
|
Create `internal/skills/tdd/handlers_test.go`:
|
|
```go
|
|
package tdd_test
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"testing"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/skills/tdd"
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func TestTDDSkillTools(t *testing.T) {
|
|
skill := tdd.New(tdd.Config{
|
|
SystemPrompt: "supervisor rules",
|
|
SkillPrompt: "tdd rules",
|
|
ExecutorFn: nil, // will be set in TestHandle
|
|
})
|
|
tools := skill.Tools()
|
|
names := make([]string, len(tools))
|
|
for i, t := range tools {
|
|
names[i] = t.Name
|
|
}
|
|
assert.ElementsMatch(t, []string{"tdd_red", "tdd_green", "tdd_refactor"}, names)
|
|
}
|
|
|
|
func TestTDDSkillHandleUnknown(t *testing.T) {
|
|
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
|
|
_, err := skill.Handle(context.Background(), "tdd_unknown", json.RawMessage(`{}`))
|
|
assert.ErrorContains(t, err, "unknown tool")
|
|
}
|
|
|
|
func TestTDDRedRequiresProjectRoot(t *testing.T) {
|
|
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
|
|
_, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"spec":"add two numbers"}`))
|
|
assert.ErrorContains(t, err, "project_root")
|
|
}
|
|
|
|
func TestTDDRedRequiresSpec(t *testing.T) {
|
|
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
|
|
_, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"project_root":"/tmp/proj"}`))
|
|
assert.ErrorContains(t, err, "spec")
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run — expect FAIL**
|
|
|
|
```bash
|
|
go test ./internal/skills/tdd/...
|
|
```
|
|
Expected: FAIL — tdd package not found
|
|
|
|
- [ ] **Step 3: Create `internal/skills/tdd/skill.go`**
|
|
|
|
```go
|
|
package tdd
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
|
|
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
|
"github.com/mathiasbq/supervisor/internal/registry"
|
|
)
|
|
|
|
// ExecutorFn allows injecting a test double for the executor.
|
|
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
|
|
|
|
type Config struct {
|
|
SystemPrompt string
|
|
SkillPrompt string
|
|
ExecutorFn ExecutorFn // nil = use real executor
|
|
DefaultModel string
|
|
}
|
|
|
|
type Skill struct {
|
|
cfg Config
|
|
}
|
|
|
|
func New(cfg Config) *Skill {
|
|
return &Skill{cfg: cfg}
|
|
}
|
|
|
|
func (s *Skill) Name() string { return "tdd" }
|
|
|
|
func (s *Skill) Tools() []registry.ToolDef {
|
|
schema := func(required []string, props map[string]any) json.RawMessage {
|
|
b, _ := json.Marshal(map[string]any{
|
|
"type": "object",
|
|
"required": required,
|
|
"properties": props,
|
|
})
|
|
return b
|
|
}
|
|
strProp := map[string]any{"type": "string"}
|
|
|
|
return []registry.ToolDef{
|
|
{
|
|
Name: "tdd_red",
|
|
Description: "Write a failing test for the described behavior. Verifies the test fails before returning.",
|
|
InputSchema: schema(
|
|
[]string{"project_root", "spec"},
|
|
map[string]any{
|
|
"project_root": strProp,
|
|
"spec": strProp,
|
|
"model": strProp,
|
|
"test_cmd": strProp,
|
|
},
|
|
),
|
|
},
|
|
{
|
|
Name: "tdd_green",
|
|
Description: "Write minimal implementation to make the test at test_path pass.",
|
|
InputSchema: schema(
|
|
[]string{"project_root", "test_path"},
|
|
map[string]any{
|
|
"project_root": strProp,
|
|
"test_path": strProp,
|
|
"model": strProp,
|
|
"test_cmd": strProp,
|
|
},
|
|
),
|
|
},
|
|
{
|
|
Name: "tdd_refactor",
|
|
Description: "Refactor the implementation at impl_path while keeping tests green.",
|
|
InputSchema: schema(
|
|
[]string{"project_root", "test_path", "impl_path"},
|
|
map[string]any{
|
|
"project_root": strProp,
|
|
"test_path": strProp,
|
|
"impl_path": strProp,
|
|
"model": strProp,
|
|
"test_cmd": strProp,
|
|
},
|
|
),
|
|
},
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Create `internal/skills/tdd/handlers.go`**
|
|
|
|
```go
|
|
package tdd
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
|
|
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
|
)
|
|
|
|
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
|
|
switch tool {
|
|
case "tdd_red":
|
|
return s.handleRed(ctx, args)
|
|
case "tdd_green":
|
|
return s.handleGreen(ctx, args)
|
|
case "tdd_refactor":
|
|
return s.handleRefactor(ctx, args)
|
|
default:
|
|
return nil, fmt.Errorf("unknown tool: %s", tool)
|
|
}
|
|
}
|
|
|
|
type redArgs struct {
|
|
ProjectRoot string `json:"project_root"`
|
|
Spec string `json:"spec"`
|
|
Model string `json:"model"`
|
|
TestCmd string `json:"test_cmd"`
|
|
}
|
|
|
|
func (s *Skill) handleRed(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
|
|
var args redArgs
|
|
if err := json.Unmarshal(raw, &args); err != nil {
|
|
return nil, fmt.Errorf("parse args: %w", err)
|
|
}
|
|
if args.ProjectRoot == "" {
|
|
return nil, fmt.Errorf("project_root is required")
|
|
}
|
|
if args.Spec == "" {
|
|
return nil, fmt.Errorf("spec is required")
|
|
}
|
|
|
|
task := fmt.Sprintf(
|
|
"phase: red\nproject_root: %s\nspec: %s\nmodel: %s\ntest_cmd: %s",
|
|
args.ProjectRoot, args.Spec, s.resolveModel(args.Model), args.TestCmd,
|
|
)
|
|
return s.execute(ctx, task)
|
|
}
|
|
|
|
type greenArgs struct {
|
|
ProjectRoot string `json:"project_root"`
|
|
TestPath string `json:"test_path"`
|
|
Model string `json:"model"`
|
|
TestCmd string `json:"test_cmd"`
|
|
}
|
|
|
|
func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
|
|
var args greenArgs
|
|
if err := json.Unmarshal(raw, &args); err != nil {
|
|
return nil, fmt.Errorf("parse args: %w", err)
|
|
}
|
|
if args.ProjectRoot == "" {
|
|
return nil, fmt.Errorf("project_root is required")
|
|
}
|
|
if args.TestPath == "" {
|
|
return nil, fmt.Errorf("test_path is required")
|
|
}
|
|
|
|
task := fmt.Sprintf(
|
|
"phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
|
|
args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
|
|
)
|
|
return s.execute(ctx, task)
|
|
}
|
|
|
|
type refactorArgs struct {
|
|
ProjectRoot string `json:"project_root"`
|
|
TestPath string `json:"test_path"`
|
|
ImplPath string `json:"impl_path"`
|
|
Model string `json:"model"`
|
|
TestCmd string `json:"test_cmd"`
|
|
}
|
|
|
|
func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
|
|
var args refactorArgs
|
|
if err := json.Unmarshal(raw, &args); err != nil {
|
|
return nil, fmt.Errorf("parse args: %w", err)
|
|
}
|
|
if args.ProjectRoot == "" {
|
|
return nil, fmt.Errorf("project_root is required")
|
|
}
|
|
if args.TestPath == "" {
|
|
return nil, fmt.Errorf("test_path is required")
|
|
}
|
|
if args.ImplPath == "" {
|
|
return nil, fmt.Errorf("impl_path is required")
|
|
}
|
|
|
|
task := fmt.Sprintf(
|
|
"phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
|
|
args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
|
|
)
|
|
return s.execute(ctx, task)
|
|
}
|
|
|
|
func (s *Skill) resolveModel(override string) string {
|
|
if override != "" {
|
|
return override
|
|
}
|
|
return s.cfg.DefaultModel
|
|
}
|
|
|
|
func (s *Skill) execute(ctx context.Context, task string) (json.RawMessage, error) {
|
|
req := iexec.Request{
|
|
SkillPrompt: s.cfg.SkillPrompt,
|
|
TaskPrompt: task,
|
|
}
|
|
|
|
var result iexec.Result
|
|
var err error
|
|
|
|
if s.cfg.ExecutorFn != nil {
|
|
result, err = s.cfg.ExecutorFn(ctx, req)
|
|
} else {
|
|
return nil, fmt.Errorf("no executor configured")
|
|
}
|
|
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
return json.Marshal(result)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 5: Run — expect PASS**
|
|
|
|
```bash
|
|
go test ./internal/skills/tdd/...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add internal/skills/tdd/
|
|
git commit -m "feat: add TDD skill with red/green/refactor handlers"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: Wire main.go
|
|
|
|
**Files:**
|
|
- Modify: `cmd/supervisor/main.go`
|
|
|
|
- [ ] **Step 1: Write failing integration test**
|
|
|
|
Create `cmd/supervisor/integration_test.go`:
|
|
```go
|
|
//go:build integration
|
|
|
|
package main_test
|
|
|
|
import (
|
|
"bytes"
|
|
"encoding/json"
|
|
"net/http"
|
|
"testing"
|
|
"time"
|
|
|
|
"github.com/stretchr/testify/assert"
|
|
"github.com/stretchr/testify/require"
|
|
)
|
|
|
|
func TestServerStartsAndResponds(t *testing.T) {
|
|
// Start the server in a goroutine (assumes it's running on :3200)
|
|
// This test is run manually after `task supervisor:dev`
|
|
time.Sleep(500 * time.Millisecond)
|
|
|
|
body, _ := json.Marshal(map[string]any{
|
|
"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": map[string]any{},
|
|
})
|
|
resp, err := http.Post("http://localhost:3200/mcp", "application/json", bytes.NewBuffer(body))
|
|
require.NoError(t, err)
|
|
defer resp.Body.Close()
|
|
|
|
assert.Equal(t, http.StatusOK, resp.StatusCode)
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run unit tests — all still pass**
|
|
|
|
```bash
|
|
go test ./...
|
|
```
|
|
Expected: PASS (integration test skipped without build tag)
|
|
|
|
- [ ] **Step 3: Rewrite `cmd/supervisor/main.go`**
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"log/slog"
|
|
"net/http"
|
|
"os"
|
|
|
|
"github.com/mathiasbq/supervisor/internal/config"
|
|
iexec "github.com/mathiasbq/supervisor/internal/exec"
|
|
"github.com/mathiasbq/supervisor/internal/mcp"
|
|
"github.com/mathiasbq/supervisor/internal/registry"
|
|
"github.com/mathiasbq/supervisor/internal/skills/tdd"
|
|
)
|
|
|
|
func main() {
|
|
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
|
|
|
|
cfg, err := config.Load()
|
|
if err != nil {
|
|
logger.Error("load config", "err", err)
|
|
os.Exit(1)
|
|
}
|
|
|
|
models, err := config.LoadModels(cfg.ModelsFile)
|
|
if err != nil {
|
|
logger.Error("load models", "err", err)
|
|
os.Exit(1)
|
|
}
|
|
|
|
systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md")
|
|
if err != nil {
|
|
logger.Error("read supervisor CLAUDE.md", "err", err)
|
|
os.Exit(1)
|
|
}
|
|
|
|
tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md")
|
|
if err != nil {
|
|
logger.Error("read tdd.md", "err", err)
|
|
os.Exit(1)
|
|
}
|
|
|
|
executor := iexec.New(iexec.Config{
|
|
SystemPrompt: string(systemPrompt),
|
|
LiteLLMBaseURL: cfg.LiteLLMBaseURL,
|
|
LiteLLMAPIKey: cfg.LiteLLMAPIKey,
|
|
})
|
|
|
|
reg := registry.New()
|
|
reg.Register(tdd.New(tdd.Config{
|
|
SystemPrompt: string(systemPrompt),
|
|
SkillPrompt: string(tddPrompt),
|
|
DefaultModel: models.Resolve("tdd", ""),
|
|
ExecutorFn: executor.Run,
|
|
}))
|
|
|
|
srv := mcp.NewServer(reg)
|
|
mux := http.NewServeMux()
|
|
mux.Handle("/mcp", srv)
|
|
|
|
addr := ":" + cfg.Port
|
|
logger.Info("supervisor starting", "addr", addr)
|
|
if err := http.ListenAndServe(addr, mux); err != nil {
|
|
logger.Error("server error", "err", err)
|
|
os.Exit(1)
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Run all tests — expect PASS**
|
|
|
|
```bash
|
|
go test ./...
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Verify it builds**
|
|
|
|
```bash
|
|
go build ./cmd/supervisor/
|
|
```
|
|
Expected: binary produced, no errors
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add cmd/supervisor/main.go cmd/supervisor/integration_test.go
|
|
git commit -m "feat: wire all components in main.go"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: Config files
|
|
|
|
**Files:**
|
|
- Create: `config/supervisor/CLAUDE.md`
|
|
- Create: `config/supervisor/tdd.md`
|
|
- Create: `.env.example`
|
|
|
|
- [ ] **Step 1: Create `config/supervisor/CLAUDE.md`**
|
|
|
|
```markdown
|
|
# Supervisor Agent
|
|
|
|
You are a supervisor agent. You orchestrate skill workers.
|
|
You do not converse. You do not explain. You execute and report.
|
|
|
|
## Output contract — NON-NEGOTIABLE
|
|
|
|
Every response must be raw JSON conforming to the provided schema.
|
|
No preamble, no markdown, no prose. Raw JSON only.
|
|
|
|
If you cannot produce valid JSON for any reason, output:
|
|
{"status":"error","phase":"","skill":"","file_path":"","runner_output":"","verified":false,"model_used":"self","message":"<reason>"}
|
|
|
|
## Verification — IRON LAW
|
|
|
|
Never trust your own output. Always run the test suite.
|
|
Use the subprocess exit code as the only source of truth.
|
|
`verified: true` only when exit code matches the expected outcome for the phase.
|
|
`verified: false` even if you believe the code is correct.
|
|
|
|
## Loop prevention
|
|
|
|
- Maximum 3 attempts per phase.
|
|
- If output is identical across two consecutive attempts, stop immediately.
|
|
- After 3 failures set status "error" and explain the blocker in message.
|
|
- Never retry indefinitely.
|
|
|
|
## Test runner detection
|
|
|
|
Inspect project_root for these signals in order:
|
|
|
|
| Signal | Command |
|
|
|---------------------------------|----------------------|
|
|
| go.mod | go test ./... |
|
|
| package.json | npm test |
|
|
| pyproject.toml / pytest.ini | pytest |
|
|
| Cargo.toml | cargo test |
|
|
| Gemfile | bundle exec rspec |
|
|
| mix.exs | mix test |
|
|
|
|
If test_cmd is provided in the task, use it directly — skip detection.
|
|
If runner cannot be determined, return status "error".
|
|
|
|
## Model delegation
|
|
|
|
When a model is specified in the task, you may use it for code generation
|
|
by calling LiteLLM at the LITELLM_BASE_URL with the specified model name.
|
|
Always retain verification yourself — never delegate test execution.
|
|
Tag model_used in your response with the model that generated the code,
|
|
or "self" if you generated it directly.
|
|
```
|
|
|
|
- [ ] **Step 2: Create `config/supervisor/tdd.md`**
|
|
|
|
```markdown
|
|
# TDD Skill
|
|
|
|
## Iron Law
|
|
|
|
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.
|
|
|
|
## Red phase
|
|
|
|
- Write exactly one test. One behavior. Name must describe the behavior clearly.
|
|
- Run the test suite. Confirm the test FAILS.
|
|
- If the test passes immediately: it tests existing behavior or is vacuous.
|
|
Return status "fail" with message explaining why the test is wrong.
|
|
- Do not write any implementation code in this phase.
|
|
|
|
## Green phase
|
|
|
|
- Write the minimal code to make the failing test pass. Nothing more.
|
|
- YAGNI: no extra parameters, no future-proofing, no clever abstractions.
|
|
- Run the test suite. Confirm it PASSES.
|
|
- If tests fail: fix the implementation, not the test. Max 3 attempts.
|
|
|
|
## Refactor phase
|
|
|
|
- Improve structure, naming, or clarity only. No new behavior.
|
|
- Tests must remain green after every change.
|
|
- If tests break during refactor: revert that change, return status "fail".
|
|
```
|
|
|
|
- [ ] **Step 3: Create `.env.example`**
|
|
|
|
```bash
|
|
# Supervisor MCP server
|
|
SUPERVISOR_PORT=3200
|
|
SUPERVISOR_CONFIG_DIR=./config/supervisor
|
|
SUPERVISOR_MODELS_FILE=./config/models.yaml
|
|
|
|
# LiteLLM gateway (iguana)
|
|
LITELLM_BASE_URL=http://iguana:4000
|
|
LITELLM_API_KEY=your-litellm-master-key
|
|
```
|
|
|
|
- [ ] **Step 4: Run all tests — still pass**
|
|
|
|
```bash
|
|
task check
|
|
```
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add config/ .env.example
|
|
git commit -m "feat: add supervisor CLAUDE.md, tdd skill prompt, and env example"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 11: Taskfile + MCP registration
|
|
|
|
**Files:**
|
|
- Modify: `Taskfile.yml`
|
|
- Modify: `.context/mcp.json`
|
|
|
|
- [ ] **Step 1: Add tasks to `Taskfile.yml`**
|
|
|
|
Add these tasks to the existing `Taskfile.yml`:
|
|
```yaml
|
|
supervisor:dev:
|
|
desc: Run supervisor MCP server (development)
|
|
cmds:
|
|
- go run ./cmd/supervisor
|
|
|
|
supervisor:build:
|
|
desc: Build supervisor binary
|
|
cmds:
|
|
- go build -o bin/supervisor ./cmd/supervisor
|
|
|
|
supervisor:test:smoke:
|
|
desc: Smoke test supervisor via MCP (requires supervisor:dev running)
|
|
cmds:
|
|
- |
|
|
curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .
|
|
```
|
|
|
|
- [ ] **Step 2: Register supervisor in `.context/mcp.json`**
|
|
|
|
Replace contents of `.context/mcp.json`:
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"knowledge": {
|
|
"url": "http://localhost:3100/mcp",
|
|
"description": "Project knowledge base — vector + graph retrieval"
|
|
},
|
|
"supervisor": {
|
|
"url": "http://localhost:3200/mcp",
|
|
"description": "Skill workers — TDD (red/green/refactor), more coming"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 3: Sync context files**
|
|
|
|
```bash
|
|
task context:sync
|
|
```
|
|
Expected: CLAUDE.md, AGENTS.md updated with new MCP server entry
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add Taskfile.yml .context/mcp.json CLAUDE.md AGENTS.md
|
|
git commit -m "feat: register supervisor MCP server and add task targets"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 12: End-to-end smoke test
|
|
|
|
Manual verification that all layers connect.
|
|
|
|
- [ ] **Step 1: Start the supervisor**
|
|
|
|
In a terminal:
|
|
```bash
|
|
cp .env.example .env # fill in LITELLM_API_KEY if needed
|
|
task supervisor:dev
|
|
```
|
|
Expected: `{"time":"...","level":"INFO","msg":"supervisor starting","addr":":3200"}`
|
|
|
|
- [ ] **Step 2: Verify tools/list**
|
|
|
|
```bash
|
|
task supervisor:test:smoke
|
|
```
|
|
Expected: JSON response with `tdd_red`, `tdd_green`, `tdd_refactor` in `result.tools`
|
|
|
|
- [ ] **Step 3: Verify tdd_red responds**
|
|
|
|
```bash
|
|
curl -s -X POST http://localhost:3200/mcp \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
|
"params":{
|
|
"name":"tdd_red",
|
|
"arguments":{
|
|
"project_root":"'"$PWD"'",
|
|
"spec":"a function that returns the sum of two integers"
|
|
}
|
|
}
|
|
}' | jq .
|
|
```
|
|
Expected: JSON-RPC response with `result.content[0].text` containing a valid supervisor Result JSON
|
|
|
|
- [ ] **Step 4: Reload Claude Code MCP config**
|
|
|
|
In Claude Code, run `/mcp` to verify the supervisor server appears and its tools are listed.
|
|
|
|
- [ ] **Step 5: Final commit**
|
|
|
|
```bash
|
|
git add -p # stage any notes or fixes from smoke test
|
|
git commit -m "chore: smoke test complete, supervisor MCP server operational"
|
|
```
|
|
|
|
---
|
|
|
|
## go.sum
|
|
|
|
After all `go get` commands in the tasks above, run:
|
|
```bash
|
|
go mod tidy
|
|
git add go.sum
|
|
git commit -m "chore: tidy go.sum"
|
|
```
|